160 likes | 251 Vues
This research delves into improving text categorization via unsupervised learning, addressing challenges in Intensional Learning. The proposed method maps similarity values using Gaussian Mixtures. Competitive performance was achieved with minimal supervision.
E N D
Presenter : Bo-Sheng Wang Authors :ALFIO GLIOZZO, IDO DAGAN TSLP, 2009 Improving Text Categorization Bootstrapping via Unsupervised Learning
Outlines Motivation Objectives Methodology Evaluation Experiments Conclusions Comments
Motivation Supervised systems for text categorization requirelarge amounts of hand-labeled texts IL inherently suffers from a score scaling problem and very little information about the intension of a category.
Objectives Investigate and improve two specific weaknesses that inherently affect the IL schema. Latent Semantic Index Gaussian Mixture Algorithm
Methodology-Gaussian Mixture Algorithm This paper propose mapping the similarity values into class posterior probabilities using unsupervised estimation of Gaussian mixtures.
Evaluation-Impact of LSI Similarity and GM on IL Performance
Evaluation-Extensional vs. Intensional Learning A major of a comparison between IL and EL is the amount of supervision required to obtain level of performance.
Conclusions We obtained competitive performance using only the category names as initial seeds. Drastically reduce the number of seeds while significantly improving the performance.
Comments • Advantages • Performance, • Disadvantage • Time • Applications • Text Mining