Download
sentiment analysis n.
Skip this Video
Loading SlideShow in 5 Seconds..
Sentiment analysis PowerPoint Presentation
Download Presentation
Sentiment analysis

Sentiment analysis

303 Vues Download Presentation
Télécharger la présentation

Sentiment analysis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Sentiment analysis Or, how to find happiness.

  2. Why do we want sentiment info? • Useful input for detection • Brand sentiment • Useful input for prediction • Stock market, box office revenues, political outcomes • Potentially for social uprisings, terrorist incidents

  3. What do you really want to know?

  4. Brand satisfaction

  5. Quality of life

  6. Abstract predictor

  7. Three considerations for a sentiment analysis system • Data cleaning • One piece of the puzzle • Simple works best

  8. Data cleaning (Because it’s a dirty world)

  9. Data cleaning: on Twitter… • Spam accounts • Bots (Weather, sport, etc…) Answer: a) http://trst.me/ (from infochimps) b) Make your own system

  10. Data cleaning: from sentences to words • Tokenize the sentence(s) into words. (This may not be as easy as it seems). • Maybe do stopping/stemming, depending on application. • Pick a threshold of times we have to see a word in our training set, below which we ignore it. • Build a dictionary of words. Answer: a) Twokenize.py b) Write your own

  11. One piece of the puzzle

  12. Always make it part of a system • When it’s wrong (and this is quite often) it will be very obviously wrong • People don’t need to see this • This doesn’t actually detract from the utility of the system

  13. Success: • Tracking political polls. • Predicting box office revenues. • Predicting the stock market.

  14. Simple works best (for now)

  15. The quick version • Use supervised/semi-supervised learning method. • For most cases I would recommend Naïve Bayes on the Bag of Words representation. Very simple to implement and near-best performance. • If you don’t have any examples of happy/sad tweets (for your purpose), use known keywords, such as emoticons.

  16. :)

  17. ^_^

  18. :(

  19. <3

  20. :/

  21. Things that don’t really help (Generally less than 2% improvement) • More advanced classifiers (eg SVMs) • Part of Speech tagging • Parse trees • Semi-supervised methods if you have very large amounts of data

  22. The formula for happiness

  23. Basic positive/negative Twitter sentiment word list • http://alexdavies.net/projects/twitter-sentiment-word-lists/

  24. Thanks.