1 / 8

Clustering Documents in a Web Directory

Clustering Documents in a Web Directory. Presenter : Shu-Ya Li Authors : Giordano Adami, Paolo Avesani, Diego Sona. WIDM 2003. Outline. Motivation Objective Methodology Experiments and Results Conclusion Personal Comments. Primates. Primates. Monkey. Apes. Apes. Monkey.

Télécharger la présentation

Clustering Documents in a Web Directory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering Documentsin a Web Directory Presenter : Shu-Ya Li Authors : Giordano Adami, Paolo Avesani, Diego Sona WIDM 2003

  2. Outline • Motivation • Objective • Methodology • Experiments and Results • Conclusion • Personal Comments

  3. Primates Primates Monkey Apes Apes Monkey Gorillas Chimpanzees Gorillas Chimpanzees Motivation • Bootstrapping a huge hierarchy with a proper set of labeled examples is a critical issue. • Bootstrapping • Automatic annotation of labeled taxonomies with flat sets of data, helping the user to design his own data structures; • The user can then remove wrongly distributed documents Bootstrapping • 從Web中產生候選的文件 • 分類候選文件 • 透過專家過濾分錯類的文件

  4. Objectives • This paperaimed at the development of a supporting tool that allows to reduce the human effort required while annotating a taxonomy with examples. • To overcome with the bootstrapping problem, such as the standard prototype-based classifier • baseline approach • the “constrained” K-means approach

  5. 1 1 1 1 1 Methodology - TaxSOM • Encode all documents in the data set as fixed size and normalized vectors (frequencies of words in a vocabulary) • Initial weights for nodes (models) are randomly chosen forcing the presence of the labels (e.g. max frequency) • Start learning iteratively updating weights. node labels compare pattern codebooks

  6. Experiments

  7. Conclusion • We proposed the TaxSOM model, which improves the baselineand K-meansperformance by explicitly including the taxonomy knowledge into the model.

  8. Personal Comments • Advantage • … • Drawback • … • Application • Web Directory

More Related