text classification in search of a representation n.
Skip this Video
Loading SlideShow in 5 Seconds..
Text classification: In Search of a Representation PowerPoint Presentation
Download Presentation
Text classification: In Search of a Representation

Text classification: In Search of a Representation

122 Vues Download Presentation
Télécharger la présentation

Text classification: In Search of a Representation

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa

  2. Outline • Supervised learning=classification • ML/DM at U of O • Classical approach • Attempt at a linguistic representation • N-grams – how to get them? • Labelling and co-learning • Next steps?…

  3. Supervised learning (classification) Given: • a set of training instances T={et}, where each t is a class label : one of the classes C1,…Ck • a concept with k classes C1,…Ck (but the definition of the concept is NOT known) Find: • a description for each class which will perform well in determining (predicting) class membership for unseen instances

  4. Classification • Prevalent practice: examples are represented as vectors of values of attributes • Theoretical wisdom, confirmed empirically: the more examples, the better predictive accuracy

  5. ML/DM at U of O • Learning from imbalanced classes: applications in remote sensing • a relational, rather than propositional representation: learning the maintainability concept • Learning in the presence of background knowledge. Bayesian belief networks and how to get them. Appl to distributed DB

  6. Why text classification? • Automatic file saving • Internet filters • Recommenders • Information extraction • …

  7. Text classification: standard approach • Remove stop words and markings • remaining words are all attributes • A document becomes a vector <word, frequency> • Train a boolean classifier for each class • Evaluate the results on an unseen sample Bag of words

  8. Text classification: tools • RIPPER A “covering”learner Works well with large sets of binary features • Naïve Bayes Efficient (no search) Simple to program Gives “degree of belief”

  9. “Prior art” • Yang: best results using k-NN: 82.3% microaveraged accuracy • Joachim’s results using Support Vector Machine + unlabelled data • SVM insensitive to high dimensionality, sparseness of examples

  10. SVM in Text classification SVM Training with 17 examples in 10 most frequent categories gives test performance of 60% on 3000+ test cases available during training Transductive SVM Maximum separation Margin for test set

  11. Problem 1: aggressive feature selection

  12. Problem 2: semantic relationships are missed

  13. Proposed solution (Sam Scott) • Get noun phrases and/or key phrases (Extractor) and add to the feature list • Add hypernyms

  14. Hypernyms - WordNet

  15. Evaluation (Lewis) • Vary the “loss ratio” parameter • For each parameter value • Learn a hypothesis for each class (binary classification) • Micro-average the confusion matrices (add component-wise) • Compute precision and recall • Interpolate (or extrapolate) to find the point where micro- averaged precision and recall are equal

  16. Results No gain over BW in alternative representations But… Comprehensibility…

  17. Combining classifiers Comparable to best known results (Yang)

  18. Other possibilities • Using hypernyms with a small training set (avoids ambiguous words) • Use Bayes+Ripper in a cascade scheme (Gama) • Other representations:

  19. Collocations • Do not need to be noun phrases, just pairs of words possibly separated by stop words • Only the well discriminating ones are chosen • These are added to the bag of words, and… • Ripper

  20. N-grams • N-grams are substrings of a given length • Good results in Reuters [Mladenic, Grobelnik] with Bayes; we try RIPPER • A different task: classifying text files Attachments Audio/video Coded • From n-grams to relational features

  21. How to get good n-grams? We use Ziv-Lempel for frequent substring detection (.gz!) abababa a b a a b b a

  22. N-grams • Counting • Pruning: substring occurrence ratio < acceptance threshold • Building relations: string A almost always precedes string B • Feeding into relational learner (FOIL)

  23. Using grammar induction (text files) • Idea: detect patterns of substrings • Patterns are regular languages • Methods of automata induction: a recognizer for each class of files • We use a modified version of RPNI2 [Dupont, Miclet]

  24. What’s new… • Work with marked up text (Word, Web) • XML with semantic tags: mixed blessing for DM/TM • Co-learning • Text mining

  25. Co-learning • How to use unlabelled data? Or How to limit the number of examples that need be labelled? • Two classifiers and two redundantly sufficient representations • Train both, run both on test set, • add best predictions to training set

  26. Co-learning • Training set grows as… • …each learner predicts independently due to redundant sufficiency (different representations) • would also work with our learners if we used Bayes? • Would work with classifying emails

  27. Co-learning • Mitchell experimented with the task of classifying web pages (profs, students, courses, projects) – a supervised learning task • Used Anchor text Page contents • Error rate halved (from 11% to 5%)

  28. Cog-sci? • Co- learning seems to be cognitively justified • Model: students learning in groups (pairs) • What other social learning mechanisms could provide models for supervised learning?

  29. Conclusion • A practical task, needs a solution • No satisfactory solution so far • Fruitful ground for research