1 / 24

Parallel C3M

Parallel C3M. Aylin Tokuç Erkan Okuyan Özlem Gür. Outline. Basics of Parallel computing Sequential C3M Parallel C3M. Parallel Comp u tat i on. Decomposition: The process of dividing a computation into smaller parts.

roza
Télécharger la présentation

Parallel C3M

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel C3M Aylin Tokuç Erkan Okuyan Özlem Gür Parallel C3M

  2. Outline • Basics of Parallel computing • Sequential C3M • Parallel C3M Parallel C3M

  3. Parallel Computation Decomposition: The process of dividing a computation into smaller parts. Task: Programmer defined units of computation into which the main computation is subdivided by means of decomposition. Parallel C3M

  4. Parallel Computation Primary Considerations • Load Balancing • Minimizing Communication • Task Dependency Optimization Parallel C3M

  5. Parallel ComputationLoad Balancing Parallel C3M

  6. Parallel Computation Minimizing Communication Parallel C3M

  7. Parallel Computation Task Dependency Optimization Parallel C3M

  8. C3M Algorithm 1- Determine the cluster seeds of the database. 2- if d, is not a cluster seed then Find the cluster seed (if any) that maximally covers d 3- If there remain unclustered documents, group them into a ragbag cluster. Parallel C3M

  9. C3M Formulas Parallel C3M

  10. C3M – Sample Matrices Parallel C3M

  11. Parallel C3M- Distribution Distribute rows among processors • Load balancing by cyclic block distribution Parallel C3M

  12. Local Calculations All processors calculate α, partial β and Pi Current Method for Weighted Matrix: too costly Need coloumn vectors (but row-wise partitioned) Parallel C3M

  13. Seed Powers Pi • Seed power Pi, should be small for a document whose terms appear in too many documents or too few documents. • Seed power Pi, should be bigger for a document whose terms appear in a moderate number of documents. Parallel C3M

  14. Minimize Communication - Proposed Heuristic All processors calculate α, partial β and β’ # of non-zeros Parallel C3M

  15. Effectiveness of Heuristic • A matlab script is written to compare the effectiveness of the proposed heuristic. • Correlation Coeeficient = 0.95 Parallel C3M

  16. Communication btw Processors • Partial β and β’ vectors are exchanged btw processors to calculate the final β and β’ vectors. • Then, all processor calculate cii=δi Parallel C3M

  17. # of Clusters • Processors exchange local δ • All processors calculate nc Parallel C3M

  18. Cluster-head Selection • Calculate seed power of local documents • Exchange largest nc seed powers. • Calculate largest nc seed powers among all Pi and find cluster heads. Parallel C3M

  19. Clustering Non-seed Docs • Exchange seed documents • Cluster non-seed documents (as in sequential C3M) in each processor. Parallel C3M

  20. Future Work • Term Based Clustering • Overlapping Clusters Parallel C3M

  21. C3M Summary • Load Balancing with cyclic block distribution • Communication minimization by a new heuristic • Task dependency minimized with block distirbution & heuristic. Parallel C3M

  22. References • Concepts and the effectiveness of the cover coefficient-based clustering methodology, F. Can, E. A. Ozkarahan • Parallelizing the Buckshot Algorithm for EfficientDocument Clustering, Eric C. Jensen, Steven M. Beitzel, Angelo J. Pilotto, Nazli Goharian, Ophir Frieder • Clustering and Classification of Large Document Basesin a Parallel Environment, Anthony S. Ruocco, Ophir Frieder • Efficient Clustering of Very Large Document Collections, I.S. Dhillon, J. Fan, Y. Guan Parallel C3M

  23. Questions? Parallel C3M

  24. The End Thank you for your patience Parallel C3M

More Related