Clustering Documents in a Web Directory

Clustering Documentsin a Web Directory Presenter : Shu-Ya Li Authors : Giordano Adami, Paolo Avesani, Diego Sona WIDM 2003

Outline • Motivation • Objective • Methodology • Experiments and Results • Conclusion • Personal Comments

Primates Primates Monkey Apes Apes Monkey Gorillas Chimpanzees Gorillas Chimpanzees Motivation • Bootstrapping a huge hierarchy with a proper set of labeled examples is a critical issue. • Bootstrapping • Automatic annotation of labeled taxonomies with flat sets of data, helping the user to design his own data structures; • The user can then remove wrongly distributed documents Bootstrapping • 從Web中產生候選的文件 • 分類候選文件 • 透過專家過濾分錯類的文件

Objectives • This paperaimed at the development of a supporting tool that allows to reduce the human effort required while annotating a taxonomy with examples. • To overcome with the bootstrapping problem, such as the standard prototype-based classifier • baseline approach • the “constrained” K-means approach

1 1 1 1 1 Methodology - TaxSOM • Encode all documents in the data set as fixed size and normalized vectors (frequencies of words in a vocabulary) • Initial weights for nodes (models) are randomly chosen forcing the presence of the labels (e.g. max frequency) • Start learning iteratively updating weights. node labels compare pattern codebooks

Experiments

Conclusion • We proposed the TaxSOM model, which improves the baselineand K-meansperformance by explicitly including the taxonomy knowledge into the model.

Personal Comments • Advantage • … • Drawback • … • Application • Web Directory

Clustering Documents in a Web Directory

Clustering Documents in a Web Directory

Presentation Transcript

Web Document Clustering

Clustering for web documents

Web Directory Listing

Web documents types

Clustering tagged documents with labeled and unlabeled documents

Clustering Documents

Clustering Documents

Web Document Clustering

Clustering Web Queries

Creating Web Documents

Pseudo-supervised Clustering for Text Documents

Clustering of Web Documents Jinfeng Chen

Creating Web documents

Web Service Clustering

Directory Script|Directory Listing Script|Web Directory Script

Web clustering Engines

Directory Script- PHP Business Directory Script- Web Directory Script

web directory sites

Web Design Directory Submission in USA

Web Document Clustering

Findable Web Directory