1 / 59

Opinion Mining

Opinion Mining. James G. Shanahan Jimi@clairvoyancecorp.com Clairvoyance Corporation Pittsburgh, PA. Clairvoyance Corporate Research. CC Spunoff from Carnegie Mellon University in 1992, Acquired by Justsystem (Japan) in 1996 4 P Research Philosopy (Pertinence), Profit, Pain Killer Patent

stella
Télécharger la présentation

Opinion Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Opinion Mining James G. Shanahan Jimi@clairvoyancecorp.com Clairvoyance Corporation Pittsburgh, PA

  2. Clairvoyance Corporate Research • CC Spunoff from Carnegie Mellon University in 1992, Acquired by Justsystem (Japan) in 1996 • 4 P Research Philosopy • (Pertinence), Profit, Pain Killer • Patent • Prototype • Publish • Pertinence • Corporate knowledge/information management • High performance Information retrieval • Cross language information retrieval • Machine Learning • Ontologies

  3. Opinion Mining Outline • Background • Monolingual Opinion Mining • Multilingual Opinion Mining • Conclusions

  4. Opinion Mining Motivation and Background • Current information management systems operate at a low level with only some semantics • Much of product feedback is web-based • provided by customers/critiques online through websites, discussion boards, mailing lists, and blogs, CRM Portals. • Market research is becoming unwieldy • Sources are heterogeneous and, increasingly, multilingual in nature

  5. Examples of Opinion on WWW

  6. Examples of Opinion on WWW

  7. Amazon.co.jp

  8. Affect in a Reporting Point of View “Microsoft Togetherness” Economist, January 22–28th, 2000, Business There is both more and less than meets the eye to the decision of Bill Gates to pass the chief executive’s mantle to his best friend, Steve Ballmer. It is still business as usual at the world’s biggest software company. … Nor does the move presage a change in strategy. A belligerent Mr Ballmer reaffirmed the company’s hardline approach to defending the continuing antitrust action, predictably describing the break-up of the company that the government is rumoured to favour as reckless and irresponsible. Although Mr Gates spoke excitedly about Next Generation Windows Services (NGWS), a new idea that he would be working on, it is, in effect, just an ugly umbrella name for the grand Internet strategy under development at Redmond for some time. …

  9. CRM: Support Desk Inquiries I spoke today with an hp technican and he really upset me. He told me that sj 4100 (usb) will be not supported. There won't be any patches. Can someone confirm that because I'm really pissed off.

  10. Monolingual Opinion Mining WWW Opinion Spider E-opinion Sites CRM Opinion Classifier Product XYZ Very Popular in US Opinion Aggregator

  11. Monolingual Opinion Mining • Build a positive and negative opinion classifier • Spider web for opinion on a product/person • Go to well-known opinion sites • Search web for related text • Classify each piece of text • Aggregate classifications

  12. Outline • Background • Monolingual Opinion Mining • Multilingual Opinion Mining • Conclusions

  13. Monolingual Opinion Mining • Lexicon-based approaches • Supervised learning approaches • Mixed Learning approaches • Hybrid approaches Human input

  14. Monolingual Opinion Mining • Lexicon-based approaches • Supervised learning approaches • Mixed Learning approaches • Hybrid approaches Human input

  15. Lexicon-based Approaches • Human provides linguistic resources • A linguist(s) characterizes each word along two dimensions: centrality (indicating the degree a word belongs to an affect category); and intensity (representing the strength of the affect level described by that entry). [Subasic and Huettner, 2000; Liu et al. 2003; Sano 2003; Tong 2001], • Liu et al. explore the use of the Open Mind Commonsense database as means of constructing a model for measuring the affective qualities of writing emails. This model is based upon a six state affect lexicon that is manually constructed. [Liu et al. 2003]

  16. <lexical entry> <POS> <category> <centrality> <intensity> "arrogance" sn "superiority" 0.7 0.9 Clairvoyance Fuzzy Affect Lexicon [Subasic and Huettner, 2000]

  17. Categories/Scales CategoryOpposing Category absurdity reasonableness advantage disadvantage amity anger attraction repulsion avoidance desire boredom excitement clarity confusion conflict cooperation courage fear creation destruction crime public-spiritedness death deception honesty desire avoidance … [83 Total]

  18. POS, Centralities, Intensities "emasculate" vb "weakness" "emasculate" vb "lack" "emasculate" vb "violence" 0.7 0.8 0.4 0.9 0.3 0.4

  19. Subjective Judgments Give Ranks

  20. Distribution of Affect Typing

  21. attraction love0.80 Fuzzy Thesaurus admiration sn attraction0.80 0.50 admire vb attraction0.80 0.50 …dazzle vb attraction0.80 0.90... magnetism sn attraction1.00 0.50 adoration sn love0.90 1.00 adore vb love0.90 1.00 …dazzle vblove0.901.00 ... passionate adj love0.70 0.90

  22. Fuzzy Tagging macabre,adj,death,0.50,0.60 macabre,adj,horror,0.90,0.60 ... savage,adj,violence,1.00,1.00 ... secret,sn,slyness,0.50,0.50 secret,sn,deception,0.50,0.50 prosperous,adj,surfeit,0.50,0.50 rat,sn,disloyalty,0.30,0.90 rat,sn,horror,0.20,0.60 rat,sn,repulsion,0.60,0.70 ... portent,sn,promise,0.70,0.90 portent,sn,warning,1.00,0.80 ... surrealistic,adj,absurdity,0.80,0.50 surrealistic,adj,creation,0.30,0.40 surrealistic,adj,insanity,0.50,0.30 surrealistic,adj,surprise,0.30,0.30 success,sn,success,1.00,0.60 whisper,vb,slyness,0.40,0.50 whisper,vb,slander,0.40,0.40 ... greed,sn,desire,0.60,1.00 greed,sn,greed,1.00,0.70 lust,vb,desire,0.80,0.90 envy,sn,desire,0.7,0.6 envy,sn,greed,0.7,0.6 envy,sn,inferiority,0.4,0.4 envy,sn,lack,0.5,0.5 envy,sn,slyness,0.5,0.6 fill,sn,surfeit,0.70,0.40 Luis Bunuel's The Exterminating Angel (1962) is a macabre comedy, a mordant view of human nature that suggests we harbor savage instincts and unspeakable secrets. Take a group of prosperous dinner guests and pen them up long enough, he suggests, and they'll turn on one another like rats in an overpopulation study. Bunuel begins with small, alarming portents. The cook and the servants suddenly put on their coats and escape, just as the dinner guests are arriving. The hostess is furious; she planned an after-dinner entertainment involving a bear and two sheep. Now it will have to be canceled. It is typical of Bunuel that such surrealistic touches are dropped in without comment. The dinner party is a success. The guests whisper slanders about each other, their eyes playing across the faces of their fellow guests with greed, lust and envy. After dinner, they stroll into the drawing room, where we glimpse a woman's purse, filled with chicken feathers and rooster claws. Luis Bunuel's The Exterminating Angel (1962) is a macabre comedy, a mordant view of human nature that suggests we harbor savage instincts and unspeakable secrets. Take a group of prosperous dinner guests and pen them up long enough, he suggests, and they'll turn on one another like rats in an overpopulation study. Bunuel begins with small, alarming portents. The cook and the servants suddenly put on their coats and escape, just as the dinner guests are arriving. The hostess is furious; she planned an after-dinner entertainment involving a bear and two sheep. Now it will have to be canceled. It is typical of Bunuel that such surrealistic touches are dropped in without comment. The dinner party is a success. The guests whisper slanders about each other, their eyes playing across the faces of their fellow guests with greed, lust and envy. After dinner, they stroll into the drawing room, where we glimpse a woman's purse, filled with chicken feathers and rooster claws. violence 1.0 humor 1.0 warning 1.0 anger 1.0 success 1.0 slander 1.0 greed 1.0 horror 0.90 aversion 0.90 absurdity 0.80 excitement 0.80 desire 0.80 pleasure 0.70 promise 0.70 surfeit 0.70 repulsion 0.60 fear 0.60 lack 0.50 death 0.50 slyness 0.50 intelligence 0.50 deception 0.50 insanity 0.50 clarity 0.40 innocence 0.40 inferiority 0.40 pain 0.30 disloyalty 0.30 failure 0.30 creation 0.30 surprise 0.30

  23. Excitement 0.6 Humor Intelligence 0.7 0.2 0.4 0.8 Love Fear Assessing Affect(In N Dimensions)

  24. Affect Selective Visualization Luis Bunuel's The Exterminating Angel (1962) is a macabre comedy, a mordant view of human nature that suggests we harbor savage instincts and unspeakable secrets. Take a group of prosperous dinner guests and pen them up long enough, he suggests, and they'll turn on one another like rats in an overpopulation study. Bunuel begins with small, alarming portents. The cook and the servants suddenly put on their coats and escape, just as the dinner guests are arriving. The hostess is furious; she planned an after-dinner entertainment involving a bear and two sheep. Now it will have to be canceled. It is typical of Bunuel that such surrealistic touches are dropped in without comment. The dinner party is a success. The guests whisper slanders about each other, their eyes playing across the faces of their fellow guests with greed, lust and envy. After dinner, they stroll into the drawing room, where we glimpse a woman's purse, filled with chicken feathers and rooster claws.

  25. Affect Total Visualization

  26. Fuzzy Retrieval Retrieve D1

  27. Lexicon-based Approaches • Human provides linguistic resources • A linguist(s) characterizes each word along two dimensions: centrality (indicating the degree a word belongs to an affect category); and intensity (representing the strength of the affect level described by that entry). [Subasic and Huettner, 2000; Liu et al. 2003; Sano 2003], • Liu et al. explore the use of the Open Mind Commonsense database as means of constructing a model for measuring the affective qualities of writing emails. This model is based upon a six state affect lexicon that is manually constructed. [Liu et al. 2003] • These approaches, while being interesting, are labor intensive and can be vulnerable to error and high maintenance costs. • However, can grow lexicon automatically • PMI Semantic orientation by association 

  28. Monolingual Opinion Mining • Lexicon-based approaches • Supervised learning approaches • Mixed Learning approaches • Hybrid approaches Human input

  29. Opinion Classifier Requirements • A labeled database of opinion • Download ratings from Amazon.com, epinions.com etc. • Build a binary opinion classifier • From positive and negative ratings • Merge 1 and 2 stars to negative and 3, 4 and 5 to positive • Using a thresholded SVM (support vector machine)

  30. Supervised learning approaches • Generating systems automatically for affect and opinion modeling. • Pang et al.’s work on classifying movie ratings (Pang et al. 2002). • Machine learning and information retrieval approaches were compared for the task of product ratings classification (Dave et al. 2003). • Das and Chen used a classifier on investor bulletin boards to see if apparently positive ratings are correlated with positive stock price. • Shanahan et al. use a thresholded SVM (support vector machine) [Shanahan et al. 2003] • Require labeled data 

  31. Monolingual Opinion Mining • Lexicon-based approaches • Supervised learning approaches • Mixed Learning approaches • Hybrid approaches Human input

  32. Mixed Learning approaches • Learning semantic orientation and intensity of terms • A word is characterised by the company it keeps [Firth 1957] • Turney, P.D., and Littman, M.L. (2003), Measuring praise and criticism: Inference of semantic orientation from association, ACM Transactions on Information Systems (TOIS), 21 (4), 315-346 • Clustering and classification, [Hatzivassiloglou and Mc Keown 1997] • “Extending affect lexicon”, AAAI EAAT Spring Symposium 2003, Grefenstette, Evans, Qu, Shanahan • Potentially a very cheap, general and powerful approach • Needs further evaluation 

  33. Mixed Learning Semantic Orientation by Assocation • “A word is characterised by the company it keeps” [Firth 1957] • Each word is characterised by its orientation (nice, nasty) and intensity (okay, fabulous) • Provide a set of labeled positive (Pwords) and negative (Nword) oriented words • The semantic orientation (i.e., positive or negative) of a word is calculated from the strength of its association with a set of positive words, minus the strength of association with negative words

  34. Labeled Semantic Orientation Words • Pwords = • {good, nice, excellent, positive, fortunate, correct, superior} • Nwords = • {bad, nasty, poor, negative, unfortunate, wrong, inferior}. • Labeled words provided by linguist • The sets consist of opposing pairs • Words are insensitive to context (i.e., almost always have the same meaning)

  35. Semantic Orientation by assocation • Various approach to calculate the semantic association of two words • Pointwise Mutual Information (PMI) [Church and Hanks 1989] • Latent Semantic Indexing (LSI) Dumais et al. 1990] • Likelihood Ratios [Dunning 1993]

  36. Semantic Orientation Pointwise Mutual Information (PMI) • Introduced by [Church and Hanks 1989] • PMI is a form of correlation measure • Positive when words co-occur and negative otherwise • Given two words, word1 and word2 Pwords = {good, nice, excellent, positive, fortunate, correct, superior} Nwords = {bad, nasty, poor, negative, unfortunate, wrong, inferior}.

  37. Pointwise Mutual Information (PMI) • PMI can be calculated by issuing queries to a search engine • Given two words, word1 and word2

  38. Semantic Orientation Results

  39. Semantic Orientation by Assocation Summary • A word is characterised by the company it keeps [Firth 19957] • Bootstrap from 14 paradigm words • Use LSA or Pointwise mutual information (PMI) to calculate the SO and intensity of expressions/sentences/documents • Very encouraging results, but needs further evaluation

  40. Hybrid Approach • The Clairvoyance Affect lexicon is incomplete • 4,000 entries • Automatically, compute the orientation and intensity of unseen terms using PMI • “Extending affect lexicon”, AAAI EAAT Spring Symposium 2003, Grefenstette, Evans, Qu, Shanahan

  41. Commercial Efforts • Commercial efforts (lexicon-based) • Justsystem’s CB Market Intelligence system that organizes feedback data in an affect map (Sano 2003). • NEC’s SurveyAnalyzer, which mines the reputations of products (Morinaga et al. 2003). • SPSS’s TextSmart • Other Applications • Opinion Timelines • Flame control • Emails/chat/comunication • Newsgroups • Directed Search • Survey analysis • CRM

  42. Outline • Background • Monolingual Opinion Mining • Multilingual Opinion Mining • Conclusions

  43. Opinion Mining Motivation and Background • Much of product feedback is web-based • provided by customers/critiques online through websites, discussion boards, mailing lists, and blogs, CRM. • Market research is becoming unwieldy • Sources are heterogeneous and, increasingly, multilingual in nature

  44. Multilingual Opinion Mining • The approaches presented above all focus on the monolingual aspects of affect and opinion. • Multilingual Opinion Mining • Practical Solution • Build a classifier for each language from labeled data • Research Solution • Examine a combination of classification and translation to build opinion classifiers

  45. Monolingual Opinion Mining WWW Opinion Spider E-opinion Sites Opinion Classifier Product XYZ Very Popular in US Opinion Aggregator

  46. Opinion Classifier Requirements • A labeled database of opinion • Download ratings from Amazon.com, epinions.com etc. • Build a binary opinion classifier • From positive and negative ratings • Merge 1 and 2 stars to negative and 3, 4 and 5 to positive • Using an SVM (support vector machine)

  47. Multilingual Opinion Mining (0) • For each language build a corresponding monolingual opinion classifier

  48. Multilingual Opinion Mining WWW Opinion Spider E-opinion Sites Chinese English Japanese Opinion Classifier Product XYZ Very Popular in US Not so Popular in Japan Overall so so world popularity Opinion Aggregator

  49. Multilingual Opinion Mining (1) Labeled Ratings Classification + Translation Translate Corpus Lang3 Classifier3 Lang 2 Classifier2 Train Classifiers Classifier1 Lang 1 Product XYZ Popular in US Popular in Japan WWW Mine Opinion

  50. Multilingual Opinion Mining (2) Classification + Translation Labeled Ratings Train Classifiers Classifier3 Classifier2 Translate Classifiers Classifier1 Product XYZ Popular in US Popular in Japan WWW Mine Opinion

More Related