80 likes | 190 Vues
This study introduces Negative Data Selection for efficient SVM-based web page classification, addressing the challenge of increased training time and improving search engine organization under topic hierarchies. The method involves feature selection and cosine similarity for optimal classifier performance. Experimental results on the Reuters dataset validate the approach.
E N D
Multi-class SVM with Negative Data Selection for Web Page Classification Chih-Ming Chen, Hahn-Ming Lee and Ming-Tyan Kao International Joint Conference on Neural Networks 2004
Motivation • Several new websites are launched everyday • Need to search fast and efficiently • Search engines organize websites under topic hierarchy (taxonomy) • Need a classifier: one-against-all SVM • Catch: huge negative data increased training time
Negative Data Selection Support vectors in the negative data are much similar to the positive data than the other negative data
Negative Data Selection • Feature Selection: top n keywords from the positive data • All websites are represented as vectors of these top n keywords. • Cosine Similarity:
Negative Data Selection • Plot similarity scores of negative to positive documents in descending order with negative documents Convergence Point Similarity Scores in Descending order Negative Documents
Experiments • Reuters dataset (10802 training, 565 test)