1 / 20

Dynamic hierarchical algorithms for document clustering

Dynamic hierarchical algorithms for document clustering. Presenter : Wei- Hao Huang Authors : Reynaldo Gil- García , Aurora Pons- Porrata PRL, 2010. Outlines. Motivation Objectives Hierarchical clustering Methodology Experiments Conclusions Comments. Motivation.

darius
Télécharger la présentation

Dynamic hierarchical algorithms for document clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic hierarchical algorithms for document clustering Presenter : Wei-Hao Huang Authors : Reynaldo Gil-García, Aurora Pons-Porrata PRL, 2010

  2. Outlines • Motivation • Objectives • Hierarchical clustering • Methodology • Experiments • Conclusions • Comments

  3. Motivation • The World Wide Web and the number of text documentsmanaged in organizational intranets continue to grow at an amazing speed. • In dynamic information environments is usually desirable to apply adaptive methods for document organization such as clustering.

  4. Objectives • Static clustering methods mainly rely on having the whole collection ready before applying the algorithm. • dynamic algorithms able to update the clustering without perform complete reclustering. • Independent on the data order.

  5. Hierarchical clustering Agglomerative and divisive Provide data-views at different levels

  6. Methodology • Dynamic hierarchical agglomerative framework • Specific algorithm: • Dynamic hierarchical compact (DHC) • Create disjoint hierarchies of clusters • Dynamic hierarchical star (DHS) • Produce overlapped hierarchies

  7. Dynamic hierarchical agglomerative framework j i β-similarity β is minimum similarity threshold i is a β-isolated cluster if its similarity with all clusters < β i is β-similarity j, if their similarity >= β

  8. Dynamic hierarchical agglomerative framework

  9. Updating of the max-S graph

  10. Dynamic hierarchical compact: Connected component cover

  11. Dynamic hierarchical star:Star cover updating

  12. Experiments Using 15 benchmark text collection. Clustering quality Sensitivity to parameters Balance Efficiency

  13. Clustering quality- Overall F1 measure

  14. Clustering quality- FCubed measure

  15. Clustering quality- HF1

  16. Sensitivity to parameters

  17. Depth and width of the hierarchies

  18. Efficiency

  19. Conclusions • Methods are suitable for producing hierarchical clustering solutions in dynamic environments effectively and efficiently. • Better balance between depth and width. • Offer hierarchies easier to browse than traditional algorithms.

  20. Comments • Advantages • Deal with dynamic data sets. • Effectiveness and the efficiency of the clustering. • Applications • Hierarchical clustering

More Related