1 / 18

Automatic Sentiment Analysis in On-line Text

Automatic Sentiment Analysis in On-line Text. Erik Boiy Pieter Hens Koen Deschacht Marie-Francine Moens CS & ICRI Katholieke Universiteit Leuven. Introduction. Goal: determine the sentiment of a person towards a topic Practical use Customer feedback Marketing research

rey
Télécharger la présentation

Automatic Sentiment Analysis in On-line Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Sentiment Analysis in On-line Text Erik Boiy Pieter Hens Koen Deschacht Marie-Francine Moens CS & ICRI Katholieke Universiteit Leuven

  2. Introduction • Goal: determine the sentiment of a person towards a topic • Practical use • Customer feedback • Marketing research • Monitoring newsgroups and forums (flame detection)‏ • Augmentation of search engines (e.g. Opinmind.com)‏ • Opportunity • Blogs • Forums • Review sites Noisy texts

  3. Overview • Introduction • Emotions • Machine learning (ML) techniques • Challenges • Experiments, results & discussion • Conclusion & future work

  4. Concepts of emotions • “Sentiments are either emotions, or they are judgements or ideas prompted or coloured by emotions” • An emotion • Is usually caused by a person consciously or unconsciously evaluating an event, which is denoted appraisal in psychology • Gives priority for one or a few kind of actions to which it gives a sense of urgency

  5. Emotions in written text • Appraisal: evaluation • e.g. It was an amazing show. • Direct expressions • e.g.I am delighted of the final results. • Elements of actions • e.g. I was grinning the whole way through it and laughing out loud more than once.

  6. Overview • Introduction • Emotions • Machine learning (ML) techniques • Challenges • Experiments, results & discussion • Conclusion & future work

  7. ML: Document representation (1)‏ • Feature extraction • Features are used to represent a document as a vector • Values in the vector indicate frequency or presence of the feature at the corresponding index in a dictionary • The dictionary consists of all features encountered in the training documents

  8. ML: Document representation (2)‏ • Unigrams: all words • N-grams: all sets of N successive words • N = 1: unigrams, N = 2: bigrams, N = 3: trigrams • e.g. I love, not worth, returned it • Lemmas: basic dictionary form of all words • e.g. cars -> car, was -> be, better -> good • Opinion words: use only words from a pre-defined list as features • Adjectives: use only adjectives (about 7.5% of the text)‏

  9. ML: Document representation (3)‏ • Stopword removal • from list with determiners, prepositions, possessive pronouns, ... • Negation tagging • of each word following a negation until the first punctuation • e.g. I don't like this movie. -> I don't NOT_like NOT_this NOT_movie.

  10. ML: Techniques • Classifiers successful for text classification • Support Vector Machines (SVM)‏ • Naive Bayes Multinomial (NBM)‏ • Maximum Entropy (Maxent)‏

  11. Challenges (1)‏ • Topic-sentiment relation • e.g. Competing with the vastly superiorCasino Royale for the same action-movie audience, Deja Vu will likely be brushed aside and quickly forgotten. • e.g. A Good Year is a well-acted well-written well-directed movie but it just wasnt my cup of tea. • Topic-neutral text • e.g. In the movie Bond can start to untangle a terror network if he wins this big poker game at Casino Royale in Montenegro.

  12. Challenges (2)‏ • Cross-domain classification • Training (and testing) was done on a mixture of movie and car reviews • Text quality • e.g. Nothing but a French kiss-off Search Recent Archives Web for (rm) else • • • • • • • • • • • • • • • • ONLINE EXTRAS SITE SERVICES Movie Listings Friday Nov 10 2006 Posted on Fri Nov. 10 2006 MOVIE REVIEW A Good Year a flat bouquet Nothing but a French kiss-off Gladiator collaborators seem defeated by light-weight love story.By ROBERT W.

  13. Overview • Introduction • Emotions • Machine learning (ML) techniques • Challenges • Experiments, results & discussion • Conclusion & future work

  14. Corpora • Pang and Lee's movie review corpus • 1000 positive and 1000 negative reviews • Reviews mix objective and subjective information • Often used in the literature • Our blog corpus • 759 positive, 205 negative and 3527 neutral sentences • Gathered from blogs, discussion boards and other websites • Extended with reviews from Customer Review Datasets corpus by Hu and Liu for balancing positive and negative

  15. Evaluation measures • Accuracy • Precision: • Recall: • Other • Speed • Available resources

  16. Results (1)‏ • Pang and Lee's movie review corpus • N-grams + easy to extract + require no special tools − large feature vector size • NBM+fast

  17. Results (2)‏ • Our blog corpus • The baseline approach: uses basic ML techniques as described earlier • Our latest approach: achieves considerable improvements over the baseline

  18. Conclusion & future work • Detection topic-sentiment relation far from perfect • Dirty texts are making the task even more difficult • Lack of training examples

More Related