1 / 15

PEBL: Web Page Classification without Negative Examples

PEBL: Web Page Classification without Negative Examples. Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, 2004 Presented by Chirayu Wongchokprasitti. Introduction.

teige
Télécharger la présentation

PEBL: Web Page Classification without Negative Examples

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, 2004 Presented by Chirayu Wongchokprasitti

  2. Introduction • Web page classification is one of the main techniques for Web mining • Constructing a classifier requires positive and negative training examples • Cautious to avoid bias and laborious to collect negative training examples

  3. Typical Learning Framework

  4. Positive Example Base Learning (PEBL) Framework • Learn from positive data and unlabeled data • Unlabeled data indicates random samples of the universal set • Apply the Mapping-Convergence (M-C)Algorithm

  5. Mapping-Convergence (M-C) Algorithm • Divide into 2 stages • Mapping stage • Use any classifier that does not generate false negatives • They chose 1-DNF ( monotone Disjunctive Normal Form) • Convergence stage • For maximizing margin • They chose SVM (Support Vector Machine)

  6. Mapping Stage • Use a weak classifier to draw an initial approximation of “strong” negative data. • First, Identify strong positive features from positive and unlabeled data by checking the frequency of those features. • If feature frequency in positive data is larger than one in the universal data, it is a strong positive • Filter out any possible positive, leaving only strong negatives.

  7. Convergence Stage • Use SVM to scope down the class boundary • Iterate SVM for certain times to extract negative data from unlabeled data • The boundary will converge into the true boundary.

  8. Support Vector Machines Visualization of a Support Vector Machine

  9. Convergence of SVM

  10. Data Flow Diagram

  11. Experimental Results • Report the result with precision-recall breakeven point (P-R) • Experiment 1: the Internet • Use DMOZ as the universal set • Experiment 2: University CS department • UseWebKB data set • Mixture Models

  12. Experiment 1

  13. Experiment 2

  14. Mixture Models

  15. Summary and Conclusions • PEBL framework eliminates the need for manually collecting negative training examples • The Mapping-Convergence (M-C) algorithm achieves classification accuracy as high as that of traditional SVM • PEBL needs faster training time

More Related