1 / 18

A Semantic, Supervised Classification Approach to Restaurant Reviews

A Semantic, Supervised Classification Approach to Restaurant Reviews. Pavani Vantimitta. Problem definition. Reviews: important source of information on new businesses Use semantic information in the reviews to predict the rating assigned to a review.

chet
Télécharger la présentation

A Semantic, Supervised Classification Approach to Restaurant Reviews

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Semantic, Supervised Classification Approach to Restaurant Reviews Pavani Vantimitta

  2. Problem definition • Reviews: important source of information on new businesses • Use semantic information in the reviews to predict the rating assigned to a review. • Use machine learning classifiers and MaxEnt classifier

  3. Data Collection • Restaurant reviews from yelp.com of places around Palo Alto • Use “Web-harvest” a web extraction tool to convert the reviews into text files • Training data comprises of 61 restaurants and 1971 reviews, Validation data consists of 12 restaurants and 361 reviews , Test data comprises of 10 restaurants with 260 reviews.

  4. Preprocessing • Removing multiple spaces between words, sentences, multiple punctuation marks • Inserting a space between a punctuation mark and the preceding word • The final data collected contains

  5. Part-of-Speech Tagging • Stanford POS tagger

  6. Semantic information • Extracting tags from words enables us to understand to some extent the tone of the review • Aim to use only adjectives (words tagged as ‘JJ’) for classification

  7. Vocabulary • Full vocabulary (all words tagged as ‘JJ’) • Vocabulary cut short by the count of words { 4,10,50,100,500 } • Vocabulary cut short by comparing words appearing in different rating reviews • Stemming – Lovins Stemmer and Iterated Lovins Stemmer

  8. Classifiers

  9. Variations in classification • V1 : Each rating class as a different class • V2 : Rating one as a class and rating five as class • V3 : Rating 1,2,3 as a class and rating 4,5 as a class

  10. Naïve Bayes

  11. Without Stemming

  12. With Stemming

  13. Results with Stemming

  14. MaxEnt Classifier: Variation 3: Best features set has 33 features

  15. Comparison to other classifiers for variation 1

  16. Conclusion

  17. Future Work • Sentence Boundary • Incorporate N-gram models • Predicting review rating for each sentence in a review and then averaging the results for the full review. Takes into account conflicting tones.

  18. THANK YOU!

More Related