1 / 12

Opinion Mining and Topic Categorization with Novel Term Weighting

Opinion Mining and Topic Categorization with Novel Term Weighting. Roman Sergienko , Ph.D student Tatiana Gasanova , Ph.D student Ulm University, Germany Shaknaz Akhmedova , Ph.D . student Siberian State Aerospace University, Krasnoyarsk , Russia. Contents. Motivation Databases

britain
Télécharger la présentation

Opinion Mining and Topic Categorization with Novel Term Weighting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Opinion Mining and Topic Categorization with Novel Term Weighting Roman Sergienko, Ph.Dstudent Tatiana Gasanova, Ph.Dstudent Ulm University, Germany ShaknazAkhmedova, Ph.D. student Siberian State Aerospace University, Krasnoyarsk, Russia

  2. Contents • Motivation • Databases • Text preprocessingmethods • The noveltermweightingmethod • Features selection • Classificationalgorithms • Resultsofnumericalexperiments • Conclusions

  3. Motivation • The goaloftheworkisto evaluate the competitiveness of the novel term weighting in comparison with the standard techniques for opining mining and topic categorization. • The criteria are: • Macro F-measure for the test set • Computational time

  4. Databases: DEFT’07 and DEFT’08

  5. The existing text preprocessing methods • Binary preprocessing • TF-IDF (Salton and Buckley, 1988) • Confident Weights (Soucy and Mineau, 2005)

  6. The novel term weighting method L – the number of classes; ni – the number of instances of the i-th class; Nji – the number of j-th word occurrence in all instances of thei-thclass; Tji=Nji/ni– the relative frequency of j-th word occurrence in the i-th class; Rj=maxiTji,Sj=arg(maxiTji) – the number of class which we assign to j-th word.

  7. Features selection • Calculating a relative frequency for each word in the each class • Choice for each word the class with the maximum relative frequency • For each classification utterance calculating sums of weights of words which belong to each class • Number of attributes = numberofclasses

  8. Classification algorithms • k-nearest neighbors algorithm with distance weighting (we have varied k from 1 to 15); • kernel Bayes classifier with Laplace correction; • neural network with error back propagation (standard setting in RapidMiner); • Rocchio classifier with different metrics and parameter; • support vector machine (SVM) generated and optimized with Co-Operation of Biology Related Algorithms (COBRA) (Akhmedova and Semenkin, 2013).

  9. Computational effectiveness DEFT’08 DEFT’07

  10. The best values of F-measure

  11. Comparison of ConfWeight and the novel term weighting

  12. Conclusions • The novel term weighting method gives similar or better classification quality than the ConfWeight method but it requires the same amount of time as TF-IDF.

More Related