600 likes | 823 Vues
807 - TEXT ANALYTICS. Massimo Poesio Lecture 4: Sentiment analysis (aka Opinion Mining). FACTS AND OPINIONS. Two main types of textual information on the Web: FACTS and OPINIONS Current search engines search for facts (assume they are true) Facts can be expressed with topic keywords .
E N D
807 - TEXT ANALYTICS Massimo PoesioLecture 4: Sentiment analysis (aka Opinion Mining)
FACTS AND OPINIONS • Two main types of textual information on the Web: FACTS and OPINIONS • Current search engines search for facts (assume they are true) • Facts can be expressed with topic keywords.
SENTIMENT ANALYSIS (also known as opinion mining) Attempts to identify the opinion/sentiment that a person may hold towards an object
Components of an opinion • Basic components of an opinion: • Opinion holder: The person or organization that holds a specific opinion on a particular object. • Object: on which an opinion is expressed • Opinion: a view, attitude, or appraisal on an object from an opinion holder.
SENTIMENT ANALYSIS GRANULARITY • At the document (or review) level: • Task: sentiment classification of reviews • Classes: positive, negative, and neutral • Assumption: each document (or review) focuses on a single object (not true in many discussion posts) and contains opinion from a single opinion holder.
SENTIMENT ANALYSIS GRANULARITY • At the document (or review) level: • Task: sentiment classification of reviews • Classes: positive, negative, and neutral • Assumption: each document (or review) focuses on a single object (not true in many discussion posts) and contains opinion from a single opinion holder. • At the sentence level: • Task 1: identifying subjective/opinionated sentences • Classes: objective and subjective (opinionated) • Task 2: sentiment classification of sentences • Classes: positive, negative and neutral. • Assumption: a sentence contains only one opinion; not true in many cases. • Then we can also consider clauses or phrases.
SENTENCE-LEVEL SENTIMENT ANALYSIS EXAMPLE Id: Abc123 on 5-1-2008 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too. It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, …”
SENTENCE-LEVEL SENTIMENT ANALYSIS Id: Abc123 on 5-1-2008 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too. It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, …”
SENTENCE-LEVEL SENTIMENT ANALYSIS Id: Abc123 on 5-1-2008 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too. It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, …”
SENTIMENT ANALYSIS GRANULARITY • At the feature level: • Task 1: Identify and extract object features that have been commented on by an opinion holder (e.g., a reviewer). • Task 2: Determine whether the opinions on the features are positive, negative or neutral. • Task 3: Group feature synonyms. • Produce a feature-based opinion summary of multiple reviews.
SENTIMENT ANALYSIS GRANULARITY • At the feature level: • Task 1: Identify and extract object features that have been commented on by an opinion holder (e.g., a reviewer). • Task 2: Determine whether the opinions on the features are positive, negative or neutral. • Task 3: Group feature synonyms. • Produce a feature-based opinion summary of multiple reviews. • Opinion holders: identify holders is also useful, e.g., in news articles, etc, but they are usually known in the user generated content, i.e., authors of the posts.
Applications • Businesses and organizations: • product and service benchmarking. • market intelligence. • Business spends a huge amount of money to find consumer sentiments and opinions. • Consultants, surveys and focused groups, etc • Individuals: interested in other’s opinions when • purchasing a product or using a service, • finding opinions on political topics • Ads placements: Placing ads in the user-generated content • Place an ad when one praises a product. • Place an ad from a competitor if one criticizes a product. • Opinion retrieval/search: providing general search for opinions.
LEXICON-BASED APPROACHES • Use sentiment and subjectivity lexicons • Rule-based classifier • A sentence is subjective if it has at least two words in the lexicon • A sentence is objective otherwise
SUPERVISED CLASSIFICATION • Treat sentiment analysis as a type of classification • Use corpora annotated for subjectivity and/or sentiment • Train machine learning algorithms: • Naïve bayes • Decision trees • SVM • … • Learn to automatically annotate new text
FEATURES FOR SUPERVISED DOCUMENT-LEVEL SENTIMENT ANALYSIS • A large set of features have been tried by researchers (see e.g., work here at Essex by RoselineAntai) • Terms frequency and different IR weighting schemes as in other work on classification • Part of speech (POS) tags • Opinion words and phrases • Negations • Syntactic dependency
EASIER AND HARDER PROBLEMS • Tweets from Twitter are probably the easiest • short and thus usually straight to the point • Reviews are next • entities are given (almost) and there is little noise • Discussions, comments, and blogs are hard. • Multiple entities, comparisons, noisy, sarcasm, etc
ASPECT-BASED SENTIMENT ANALYSIS • Sentiment classification at the document or sentence (or clause) levels are useful, but do not find what people liked and disliked. • They do not identify the targets of opinions, i.e., ENTITIES and their ASPECTS • Without knowing targets, opinions are of limited use.
ASPECT-BASED SENTIMENT ANALYSIS • Much of the research is based on online reviews • For reviews, aspect-based sentiment analysisis easier because the entity (i.e., product name) is usually known • Reviewers simply express positive and negative opinions on different aspects of the entity. • For blogs, forum discussions, etc., it is harder: • both entity and aspects of entity are unknown • there may also be many comparisons • and there is also a lot of irrelevant information.
BRIEF DIGRESSION • Regular opinions: Sentiment/opinion expressions on some target entities • Direct opinions: The touch screen is really cool • Indirect opinions: “After taking the drug, my pain has gone” • COMPARATIVE opinions: Comparisons of more than one entity. • “iPhone is better than Blackberry”
Find entities (entity set expansion) • Although similar, it is somewhat different from the traditional named entity recognition (NER). (See next lectures) • E.g., one wants to study opinions on phones • given Motorola and Nokia, find all phone brands and models in a corpus, e.g., Samsung, Moto,
Feature/Aspect extraction • May extract frequent nouns and noun phrases • Sometimes limited to a set known to be related to the entity of interest or using part discriminators • e.g., for a scanner entity “scanner”, “scanner has” • opinion and target relations • Proximity or syntactic dependency • Standard IE methods • Rule-based or supervised learning • Often HMMs or CRFs (like standard IE)
RESOURCES FOR SENTIMENT ANALYSIS • Annotated corpora • Used in statistical approaches (Hu & Liu 2004, Pang & Lee 2004) • MPQA corpus (Wiebe et. al, 2005) • Tools • Algorithm based on minimum cuts (Pang & Lee, 2004) • OpinionFinder (Wiebe et. al, 2005) • Lexicons • General Inquirer (Stone et al., 1966) • OpinionFinder lexicon (Wiebe & Riloff, 2005) • SentiWordNet (Esuli & Sebastiani, 2006)
Lexical resources for Sentiment and Subjectivity Analysis Overview
Sentiment-bearing words ICWSM 2008 • AdjectivesHatzivassiloglou & McKeown 1997, Wiebe 2000, Kamps & Marx 2002, Andreevskaia & Bergler 2006 • positive:honest important mature large patient • Ron Paul is the only honest man in Washington. • Kitchell’s writing is unbelievably mature and is only likely to get better. • To humour me my patient father agrees yet again to my choice of film
Negative adjectives ICWSM 2008 • Adjectives • negative: harmful hypocritical inefficient insecure • It was a macabre and hypocritical circus. • Why are they being so inefficient ? bjective: curious, peculiar, odd, likely, probably
Subjective adjectives ICWSM 2008 • Adjectives • Subjective (but not positive or negative sentiment): curious, peculiar, odd, likely, probable • He spoke of Sue as his probable successor. • The two species are likely to flower at different times.
Otherwords ICWSM 2008 • Other parts of speechTurney & Littman 2003, Riloff, Wiebe & Wilson 2003, Esuli & Sebastiani 2006 • Verbs • positive:praise, love • negative: blame, criticize • subjective: predict • Nouns • positive: pleasure, enjoyment • negative: pain, criticism • subjective:prediction, feeling
Phrases ICWSM 2008 • Phrases containing adjectives and adverbsTurney 2002, Takamura, Inui & Okumura 2007 • positive: high intelligence, low cost • negative: little variation, many troubles
Creating sentiment lexica ICWSM 2008 Humans Semi-automatic Fully automatic
(Semi) Automatic creation of sentiment lexica ICWSM 2008 • Find relevant words, phrases, patterns that can be used to express subjectivity • Determine the polarity of subjective expressions
USING PATTERNS ICWSM 2008 Lexico-syntactic patternsRiloff & Wiebe 2003 way with <np>:… to ever let China use force to have its way with … expense of <np>: at the expense of the world’s security and stability underlined <dobj>: Jiang’s subdued tone … underlined his desire to avoid disputes …