PEBL: Web Page Classification without Negative Examples

PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, 2004 Presented by Chirayu Wongchokprasitti

Introduction • Web page classification is one of the main techniques for Web mining • Constructing a classifier requires positive and negative training examples • Cautious to avoid bias and laborious to collect negative training examples

Typical Learning Framework

Positive Example Base Learning (PEBL) Framework • Learn from positive data and unlabeled data • Unlabeled data indicates random samples of the universal set • Apply the Mapping-Convergence (M-C)Algorithm

Mapping-Convergence (M-C) Algorithm • Divide into 2 stages • Mapping stage • Use any classifier that does not generate false negatives • They chose 1-DNF ( monotone Disjunctive Normal Form) • Convergence stage • For maximizing margin • They chose SVM (Support Vector Machine)

Mapping Stage • Use a weak classifier to draw an initial approximation of “strong” negative data. • First, Identify strong positive features from positive and unlabeled data by checking the frequency of those features. • If feature frequency in positive data is larger than one in the universal data, it is a strong positive • Filter out any possible positive, leaving only strong negatives.

Convergence Stage • Use SVM to scope down the class boundary • Iterate SVM for certain times to extract negative data from unlabeled data • The boundary will converge into the true boundary.

Support Vector Machines Visualization of a Support Vector Machine

Convergence of SVM

Data Flow Diagram

Experimental Results • Report the result with precision-recall breakeven point (P-R) • Experiment 1: the Internet • Use DMOZ as the universal set • Experiment 2: University CS department • UseWebKB data set • Mixture Models

Experiment 1

Experiment 2

Mixture Models

Summary and Conclusions • PEBL framework eliminates the need for manually collecting negative training examples • The Mapping-Convergence (M-C) algorithm achieves classification accuracy as high as that of traditional SVM • PEBL needs faster training time

PEBL: Web Page Classification without Negative Examples

PEBL: Web Page Classification without Negative Examples

Presentation Transcript

Pushdown Automata - Examples

ALGORITHM TYPES

Morphologic Classification of Mouse Mammary Tumors with Some Examples in Rat Sabine Rehm, Dr. med. vet., ACVP Diplomate

BENTHAM AND HOOKER’S SYSTEM OF ANGIOSPERM CLASSIFICATION

Vector Space Text Classification

Science Fair Project

VIRUS CLASSIFICATION

Text Classification

Topic 1: Classification

Classification, nomenclature, taxonomy,identification

Gram negative rods and cocci

Classification of Living Things

ENZYMES: CLASSIFICATION, STRUCTURE

Text Classification

Classification

Classification and Prediction

TECHNICAL TRAINING HANDBOOK (LEVEL I)

Data Mining: Classification and Prediction

Gram-negative Bacilli

Fracture Classification