1 / 17

CS 533 Information Retrieval Systems

Semantic Analysis of Product Reviews for Feature Summarization ERDEM ÖZDEMİR UTKU OZAN YILMAZ BUĞRA MEHMET YILDIZ ÖMER FARUK UZAR Bilkent University Computer Engineering Department. CS 533 Information Retrieval Systems. Outline. Introduction Motivation

ian-nixon
Télécharger la présentation

CS 533 Information Retrieval Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Analysis of Product Reviews for Feature Summarization ERDEM ÖZDEMİR UTKU OZAN YILMAZBUĞRA MEHMET YILDIZ ÖMER FARUK UZARBilkent UniversityComputer Engineering Department CS 533 Information Retrieval Systems

  2. Outline • Introduction • Motivation • Sentiment Analysis of Product Reviews • Preparation of Dataset • Learning • Processing of Product Reviews • Learning Association Rules • Presentation of Results • Progress So Far • Summary • Conclusion

  3. Introduction • User participation to Web sites increased with Web 2.0 • Product reviews written by users in e-commerce sites • User opinions • Essential as they reflect the real experience of the people who actually use the products

  4. Introduction • Use opinion mining (sentiment analysis) • Derive user opinions about product features • Determine their sentiment orientation • Analyzing if an opinion is positive or negative • Summarize that information to the user • Dataset • Use Turkish product reviews for mobile phones

  5. Motivation • Influence of the experience of a product’s users on people who consider buying it • Their analysis will be useful for buyers, producers and e-commerce systems • Users start to read a small fraction of product reviews as the number of them in e-commerce systems increases • Usually results in unawareness of some features of the products and opinions about them • Product reviews are generally repetitive • Reading all of them is generally inefficient • There is a need for summarization in product reviews • Lack of such a system for Turkish language

  6. Sentiment Analysis of Product Reviews • It consists of five steps • Preparation of Dataset • Learning • Processing of Product Reviews • Learning Association Rules • Presentation of Results

  7. Preparation of Dataset • Use mobile phone reviews in Hepsiburada.com • Choice is based on the size of the dataset provided • Parse the website • To find links to cell phones • To extract user reviews • Strip off text from HTML tags • Put the parsed text into a database with some extra information • Reviewer’s grade of the product • People’s grade of the review etc

  8. Learning • Calculate sentiment orientation of words • Using Word Net with seeded oriented words and Turney’s approach using search engine queries are not suitable for Turkish • Best approach so far is using the reviewer’s grade of the product • For each opinion word owj • Orientation (owj) = ∑ (tfi,j x idfj x gi) / |{r:owjЄ r}|

  9. Learning • Calculation of likelihood of feature - opinion match • For each sentence • Find feature and opinions • Count number of times they appear together • Count their individual appearances • Calculate likelihood of feature opinion match • |Featurei & Opinionj|2 / |Featurei| x |Opinionj|

  10. Processing of Product Reviews • Aims to find <Product, Feature> and <Feature,Opinion> matches • Example • “Fiyatına göre iyi bir telefon kullanışlı tavsiye ederim.” • Features: telefon, fiyat • Opinions: iyi, kullanışlı, tavsiye ederim • Matches: <iyi, telefon>, <telefon, tavsiye ederim>, <kullanışlı, telefon>

  11. Processing of Product Reviews • First thing to do is applying POS Tagger to a sentence • “Konuşurken karşı tarafın sesi sanki biraz az geliyor gibi geldi bana.” → “Adverb Adj Noun+A3sg+Pnon+Gen Noun+A3sg+P3sg+NomFet Adj Adj Adj Verb+Pos+Prog1+A3sg Postp Verb+Pos+Past+A3sg Pron+A1sg+Pnon+Dat Punc“ • For opinion finding, we only use adjectives, we miss some opinions words like “tavsiye ediyorum” • For features, we search them from a list we have • “Kamerası iyi çekiyor.” (explicit feature : kamera) • “Telefon çekim kalitesi yüksek.” (implicit feature: kamera?)

  12. Processing of Product Reviews • Assignment of opinions to features • Use rules • (Adv) Adj (Num) Noun, Noun (Adv|Adj) Adj Punc • Use Likelihood values • Find assignment among feature and opinions that maximize the sum of likelihoods which has been learned earlier in learning process. • Store features, feature-opinion pairs and their places that are mentioned in product

  13. Learning Association Rules • Perform association rule analysis to obtain frequent feature item sets • Use transactions extracted in the previous step • Association rule • Implication in the form of X => Y • Existence of variable X implies existence of Y • Two kinds of association rules • Product => Feature • Feature => Opinion • After obtaining such association rules, prune the ones that are not repeated frequently and ones that are not interesting regarding their sentiment orientation

  14. Presentation of Results • Provide a web user interface • Users can access the results by submitting the name of the product they want to fetch information about to the system • Example Interface

  15. Progress So Far • Accomplished most of the essential steps of our project • Prepared our dataset • Fetch data from Hepsiburada.com • Process it • Put it into a database • Performed sentiment analysis • Obtained promising results with our methods • Now, we are working on our web user interface and processing of product reviews

  16. Summary • Project’s five steps • Preparation of Dataset • Learning • Processing of Product Reviews • Learning Association Rules • Presentation of Results

  17. Conclusion • Problems • Authors don’t use the language properly and correctly • There is no tool to perform syntax analysis of Turkish • Evaluation problem: How to calculate recall? • Simple solutions generally work better in diverse datasets and high dimensional problems

More Related