1 / 12

Clustering tagged documents with labeled and unlabeled documents

Clustering tagged documents with labeled and unlabeled documents. Presenter : Jian-Ren Chen Authors : Chien -Liang Liu*, Wen -Hoar Hsaio , Chia -Hoang Lee, Chun- Hsien Chen 2013 , IPM. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.

kosey
Télécharger la présentation

Clustering tagged documents with labeled and unlabeled documents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering tagged documents with labeled and unlabeled documents Presenter : Jian-Ren ChenAuthors : Chien-Liang Liu*, Wen-Hoar Hsaio, Chia-Hoang Lee, Chun-Hsien Chen2013 , IPM

  2. Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

  3. Motivation Tagscan provide semantic information about the resources and they can help machines perform the classification or clustering tasks accurately. Probabilistic latent semantic analysis (PLSA) - aspect model - statistical clustering model

  4. Objectives • This study employs Constrained-PLSA to cluster tagged documents with a small amount of seeds. • The Constrained-PLSA is based on statistical clustering model rather than aspect model.

  5. Methodology - PLSA E-step M-step Terms (keywords) of the document collection documents

  6. Methodology - Constrained-PLSA E-step M-step

  7. Experiments -Data set A (CiteULike)

  8. Experiments (Data set A)

  9. Experiments -Data set B (CiteULike)

  10. Experiments (Data set B)

  11. Conclusions • The performance of ‘‘tags as words’’ representation scheme is more stable than ‘‘words + tags’’ representation scheme. • Unsupervised learning methods fail to function properly in the data set with noisy information, but Constrained-PLSA function properly and stable even though only a small amount of labeled data is available.

  12. Comments • Advantages - Constrained-PLSA outperforms the other methods • Disadvantage - too much artificial processing in experiment • Applications • text mining • tagged document clustering

More Related