200 likes | 324 Vues
This presentation by Wei-Hao Huang discusses dynamic hierarchical algorithms designed for clustering documents in ever-expanding digital environments. It outlines the motivation behind adaptive clustering methods, addressing the limitations of static approaches that require complete datasets. The methodology focuses on a dynamic hierarchical agglomerative framework, featuring algorithms such as Dynamic Hierarchical Compact (DHC) and Dynamic Hierarchical Star (DHS). The results from experiments using benchmark text collections demonstrate improvements in clustering efficiency, quality, and adaptability, making these methods ideal for managing dynamic data environments.
E N D
Dynamic hierarchical algorithms for document clustering Presenter : Wei-Hao Huang Authors : Reynaldo Gil-García, Aurora Pons-Porrata PRL, 2010
Outlines • Motivation • Objectives • Hierarchical clustering • Methodology • Experiments • Conclusions • Comments
Motivation • The World Wide Web and the number of text documentsmanaged in organizational intranets continue to grow at an amazing speed. • In dynamic information environments is usually desirable to apply adaptive methods for document organization such as clustering.
Objectives • Static clustering methods mainly rely on having the whole collection ready before applying the algorithm. • dynamic algorithms able to update the clustering without perform complete reclustering. • Independent on the data order.
Hierarchical clustering Agglomerative and divisive Provide data-views at different levels
Methodology • Dynamic hierarchical agglomerative framework • Specific algorithm: • Dynamic hierarchical compact (DHC) • Create disjoint hierarchies of clusters • Dynamic hierarchical star (DHS) • Produce overlapped hierarchies
Dynamic hierarchical agglomerative framework j i β-similarity β is minimum similarity threshold i is a β-isolated cluster if its similarity with all clusters < β i is β-similarity j, if their similarity >= β
Experiments Using 15 benchmark text collection. Clustering quality Sensitivity to parameters Balance Efficiency
Conclusions • Methods are suitable for producing hierarchical clustering solutions in dynamic environments effectively and efficiently. • Better balance between depth and width. • Offer hierarchies easier to browse than traditional algorithms.
Comments • Advantages • Deal with dynamic data sets. • Effectiveness and the efficiency of the clustering. • Applications • Hierarchical clustering