1 / 24

One-class Classification of Text Streams with Concept Drift

One-class Classification of Text Streams with Concept Drift. Yang ZHANG, Xue LI , Maria Orlowska DDDM 2008 The University of Queensland Australia. Outline. Motivation Related Work Framework for One-class Classification of Data Stream Learning Concept Drift under One-class Scenario

Télécharger la présentation

One-class Classification of Text Streams with Concept Drift

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

  2. Outline • Motivation • Related Work • Framework for One-class Classification of Data Stream • Learning Concept Drift under One-class Scenario • Experiment Result • Future Work

  3. Motivation • State-of-art data stream classification algorithm: • Based on fully labeled data. • Impossible to label all data. • Expensive to label data. • Changing of user interests. • Difficult apply to real-life applications.

  4. Scenario • The user feedback emails to the customer service section: • finding out the feedback emails of a certain newly launched product. • Building a text data stream classification system to retrieve all the ontopic feedbacks. • Section manager behavior: • Patient enough to label only a few ontopic emails. • No patient to label offtopic emails.

  5. One-class Classification of Text Stream • Challenge • Concept drift. • Small number of training data. • No negative training data. • Noisy data. • Limited memory space.

  6. Related work • Semi-supervised classification of data stream, cannot cope with concept drift. • [Wu&Yang, ICDMW06] • Active learning for data stream classification, cannot cope with concept drift caused by sudden shift of user interests. • [Fan&Huang, SDM04] [Fan&Huang, ICDM04] [Huang&Dong, IDA07] • Need multiply scan. • [Klinkenberg &Joachims, ICML00]

  7. Related Work • Static approaches for data stream classification (fully labelled). • [Street&Kim, KDD01] [Wang&Fan, KDD03] • Dynamic approaches for data stream classification (fully labelled). • [Kolter&Maloof, ICDM03] [Zhang&Jin,SIGmodRecord06] [Zhu&Wu, ICDM04] • One-class text classification. • [Li&Liu, ECML05] [Liu&Dai, ICDM03] [Liu&Li, AAAI04]

  8. Proposed Approach

  9. Base Classifier Selection – phenomena observed • If the reader is very interested in a certain topic today, say, sports, then, there is a high probability that he is also interested in sports tomorrow. • If the reader is interested in a topic, say, sports, and for some reason his interests change to another topic, say, politics, then after sometime, there is high probability that his interests will change back to sports again.

  10. Base Classifier Selection - strategy • The ensemble should keep some recent base classifier. • The ensemble should keep some base classifiers which represent the reader's interests in the long run.

  11. Experiment Result • Dataset: 20NewsGroup • We compare the following approaches: • Single Window (SW): The classifier is built on the current batch of data. • Full Memory (FM): The classifier is built on the current batch of data, together with positive samples dated back to batch 0. • Fixed Window (FW): The classifier is built on the samples from a fixed size of windows. • Ensemble (EN): The classifier is built by the algorithms proposed in this paper.

  12. Experiment Scenarios • 4 groups of experiments: • Experiment with concept drift caused by changing of user interests. • Experiment with heavy vs. gradual concept drift. • Experiment with concept drift caused by both changing of user interests and data distribution. • Experiment with 5 ontopic categories.

  13. Experiment: concept drift caused by changing of user interests.

  14. Experiment: concept drift caused by changing of user interests.

  15. Experiment: concept drift caused by changing of user interests.

  16. Experiment: heavy vs. gradual concept drift.

  17. Experiment: heavy vs. gradual concept drift.

  18. Experiment: heavy vs. gradual concept drift.

  19. Experiment: changing of user interests & data distribution. • Very similar to the experiment result observed in the first group of experiment.

  20. Experiment : 5 ontopic categories.

  21. Experiment : 5 ontopic categories.

  22. Experiment : 5 ontopic categories.

  23. Conclusion & Future Research • We firstly tackled the problem of the one-class classification on streaming data, by ensemble based approach. • Future research • Dynamic feature space • One-class classification on general data streaming.

  24. Thank you! 

More Related