1 / 9

Multiobjective Clustering with Automatic k-determination for Large-scale Data

Multiobjective Clustering with Automatic k-determination for Large-scale Data. Presenter : Shao-Wei Cheng Authors : Nobukazu Matake, Tomoyuki Hiroyasu, Mitsunori Miki, Tomoharu Senda. CECCO 2007. Outline. Motivation Objective Methodology Original MOCK

jacquest
Télécharger la présentation

Multiobjective Clustering with Automatic k-determination for Large-scale Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiobjective Clustering with Automatic k-determination for Large-scale Data Presenter : Shao-Wei Cheng Authors : Nobukazu Matake, Tomoyuki Hiroyasu, Mitsunori Miki, Tomoharu Senda CECCO 2007

  2. Outline • Motivation • Objective • Methodology • Original MOCK • New scalable k-determination scheme • Experiments and Results • Conclusion • Personal Comments

  3. Motivation • Web behavior mining has attracted a great deal of attention today. • MOCK is powerful and strict. But the computational costs are too high when applied to clustering huge data. Too Much Data !!

  4. Objectives • Apply MOCK to web data clustering with a scalable automatic k-determination scheme. • Determine the appropriate k at low cost. • It contains two complementary objectives. • Determination of appropriate k. • Find partitions between k clusters.

  5. Methodology • Original MOCK Third Step First Step Forth Step Second Step Gap statistic

  6. Methodology • New scalable k-determination scheme First Step Second Step First scheme:Calculate adjacent angles x y Second scheme x x

  7. Experiments

  8. Conclusion • The new scheme is able to determine the appropriate k at low cost, although the performance is poorer than the original algorithm. • Reduce the Pareto size by about 50-70%. • Doesn’t need random data clustering.

  9. Personal Comments • Advantage • MOCK can be applied to large-scale data. • Drawback • Application • Web data.

More Related