1 / 39

Criticism Mining: Text Mining Experiments on Book, Movie and Music Reviews

THE ANDREW W. MELLON FOUNDATION. Criticism Mining: Text Mining Experiments on Book, Movie and Music Reviews. Xiao Hu, J. Stephen Downie, M. Cameron Jones The International Music Information Retrieval Systems Evaluation Lab (IMIRSEL) University of Illinois at Urbana-Champaign. Agenda.

seda
Télécharger la présentation

Criticism Mining: Text Mining Experiments on Book, Movie and Music Reviews

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. THE ANDREW W. MELLON FOUNDATION Criticism Mining: Text Mining Experiments on Book, Movie and Music Reviews Xiao Hu, J. Stephen Downie, M. Cameron Jones The International Music Information Retrieval Systems Evaluation Lab (IMIRSEL) University of Illinois at Urbana-Champaign

  2. Agenda • Motivation • Customer reviews in epinions.com • Experimental Setup • Data set • Results • Conclusions & Future Work

  3. Motivation • Critical consumer-generated reviews of humanities materials • a rich resource of reviewers’ opinions, and background / contextual information • self-organized: pave ways to automatic processing • Text mining: mature and ready to use • Criticism mining: provides a tool to assist humanities scholars • Locating • Organizing • Analyzing critical review content

  4. Customer Reviews • Published on www.epinions.com • Focused on the book, movie and music • Each review associated with: • a genre label • a numerical quality rating

  5. numerical rating associated used in our experiments

  6. 28 Major Genre Categories Jazz, Rock, Country, Classical, Blues, Gospel, Punk, .… Renaissance, Medieval, Baroque, Romantic, … Music Genres

  7. Experimental Setup • to build and evaluate a prototype criticism mining system that could automatically : • predict thegenre of the work being reviewed • predict thequality rating assigned to the reviewed item • differentiate book reviews and movie reviews, especially for items in the same genre • differentiate fiction and non-fiction book reviews

  8. Data set

  9. Genre Taxonomy

  10. Genre Taxonomy : Book

  11. Genre Taxonomy : Music • The genre labels and the rating information provided the ground truth for experiments

  12. Data Preprocessing • HTML tags were stripped out; • Stop words were NOT stripped out; • Punctuation was NOT stripped out; • They may contain stylistic information • Tokens were stemmed

  13. Categorization Model & Implementation • Naïve Bayesian (NB) Classifier • Computationally efficient • Empirically effective • Text-to-Knowledge (T2K) Toolkit • A text mining framework • Ready-to-use modules and itineraries • Natural Language Processing tools integrated • Supporting fast prototyping of text mining

  14. NB itinerary in T2K Data Preprocessing NB Classifier

  15. Results & Discussions

  16. Genre Classification 5 fold random cross validation for book and movie reviews 3 fold random cross validation for music reviews

  17. Confusion : Book Reviews

  18. Confusion : Movie

  19. Confusion : Music

  20. Rating Classification • Five-class classification • 1 star vs. 2 stars vs. 3 stars vs. 4 stars vs 5 stars • Binary Group classification • 1 star + 2 stars vs. 4 stars + 5 stars • ad extremis classification • 1 star vs. 5 stars 5 fold random cross validation for all experiments

  21. Rating : Book Reviews

  22. Rating : Movie Reviews

  23. Rating : Music Reviews

  24. Confusion : Book Reviews

  25. Confusion : Movie Reviews

  26. Confusion : Music Reviews

  27. Classification of Book and Movie Reviews 1 • Reviews on all available genres • Books : 9 genres; Movies : 11 genres • Reviews on individual, comparable genres

  28. Classification of Book and Movie Reviews 2 • Eliminated words that can directly suggest the categories: • "book", "movie", "fiction", "film", "novel", "actor", "actress", "read", "watch", "scene" • Frequently occurred in each category, but not both • To make things harder / avoid oversimplifying • Results suggest stylistic difference in users’ criticisms on books and movies 5 fold random cross validation for all experiments

  29. Book vs. Movie Reviews 1

  30. Book vs. Movie Reviews 2

  31. Book vs. Movie Reviews 3

  32. Classification of Fiction and Non-fiction Book Reviews 1

  33. Classification of Fiction and Non-fiction Book Reviews 2 • Eliminated words that can directly suggest the categories: • "fiction", "non", "novel", "character", "plot", and "story" • Frequently occurred in each category, but not both • To make things harder / avoid oversimplifying • Results suggest stylistic difference in users’ criticisms on fiction books and non-fiction ones 5 fold random cross validation for all experiments

  34. Fiction vs. Non-fiction Book Reviews

  35. Confusion : Fiction vs. Non-fiction Book Reviews

  36. Conclusions • Customer reviews are an excellent resource for studying humanities materials • Successful experiments: • High classification precisions: Genres; Ratings; Book vs. movie reviews Fiction vs. non-fiction book reviews • Reasonable confusions • Text mining techniques can help find important information about the materials being reviewed Criticism Mining : make the ever-growing consumer-generated review resources useful to humanities scholars.

  37. Future work • More text mining techniques • decision trees, frequent pattern mining • Other critical text • blogs, wikis, etc • Other facets of reviews • “usage” in music reviews • Feature studies • answer the “why” questions

  38. References • Argamon, S., and Levitan, S. (2005). Measuring the Usefulness of Function Words for Authorship Attribution. Proceedings of the 17th Joined International Conference of ACH/ALLC. • Downie, J. S., Unsworth, J., Yu, B., Tcheng, D., Rockwell, G., and Ramsay, S. J. (2005). A Revolutionary Approach to Humanities Computing?: Tools Development and the D2K Data-Mining Framework. Proceedings of the 17th Joined International Conference of ACH/ALLC. • Hu, X., Downie, J. S., West, K., and Ehmann, A. (2005). Mining Music Reviews: Promising Preliminary Results. Proceedings of the Sixth International Conference on Music Information Retrieval (ISMIR). • Sebastiani, F. (2002). Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34, 1. • Stamatatos, E., Fakotakis, N., and Kokkinakis, G. (2000). Text Genre Detection Using Common Word Frequencies. Proceedings of 18th International Conference on Computational Linguistics.

  39. THE ANDREW W. MELLON FOUNDATION Questions? IMIRSEL Thank you!

More Related