1 / 18

Adaptive Cluster Ensemble Selection

Adaptive Cluster Ensemble Selection. Javad Azimi, Xiaoli Fern {azimi, xfern}@eecs.oregonstate.edu Oregon State University Presenter: Javad Azimi. Cluster Ensembles. Data Set. Setting up different clustering methods. Clustering 1. Clustering 2. …………. Clustering n.

ita
Télécharger la présentation

Adaptive Cluster Ensemble Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive Cluster Ensemble Selection Javad Azimi, Xiaoli Fern {azimi, xfern}@eecs.oregonstate.edu Oregon State University Presenter: Javad Azimi.

  2. Cluster Ensembles Data Set Setting up different clustering methods. Clustering 1 Clustering 2 …………. Clustering n Generating different results. Result 1 Result 2 ……………. Result n Consensus Function Combine to obtain final results. Final Clusters

  3. Cluster Ensembles:Challenge • One can easily generate hundreds or thousands of clustering results. • Is it good to always include all clustering results in the ensemble? • We may want to be selective. • Which subset is the best?

  4. What makes a good ensemble? • Diversity • Members should be different from each other • Measured by Normalized Mutual Information (NMI) • Select a subset of ensemble members based on diversity: • Hadjitodorov et al. 2005: Ensemble with median diversity usually works better. • Fern and Lin 2008: Cluster ensemble members into distinct groups and then choose one from each group.

  5. Diversity in Cluster Ensembles:Drawback • They aim to design selection heuristics without considering the characteristics of the data sets and ensembles. • Our goal: selecting adaptively based on the behavior of the data set and ensemble itself.

  6. Our Approach • We empirically examined the behavior of the ensembles and the clustering performance on 4 different data sets. • Use the four training sets to learn an adaptive strategy • We evaluated the learned strategy on test data sets. • 4 training data sets: Iris, Soybean, Wine, Thyroid.

  7. An Empirical Investigation • Generate a large ensemble • 100 independent runs of two different algorithms (K-means and MSF) • Analyze the diversity of the generated ensemble • Generate a final result P* based on all ensemble members • Compute the NMI between ensemble members and P* • Examine the distribution of the diversity • Consider different potential subsets selected based on diversity and evaluate their clustering performance

  8. Observation #1 • There are two distinct types of ensembles • Stable: most ensemble members are similar to P* • Unstable: most ensemble members are different from P*. stable unstable # of ensembles NMI with P*

  9. Consider DifferentSubsets • Compute the NMI between each member and P* • Sort NMI values • Consider 4 different subsets Low diversity (L) High diversity (H) Members sorted based on NMI values Medium diversity (M)

  10. Observation #2 • Different subsets work the best for stable and unstable data: • Stable: subsets F and L worked well • Unstable: subset H worked well

  11. Our final strategy • Generate a large ensemble П (200 solutions) • Obtain the consensus partition P* • Compute NMI between ensemble members and P* and sort them in decreasing order. • If average NMI > 0.5, classify ensemble as stable and output P* as the final partition • Otherwise, classify ensemble as non-stable and select the H (high diversity) subset, and output its consensus clustering.

  12. Experimental Setup • 100 independent runs of k-means and MSF are used to generate the ensemble members. • Consensus function: average link HAC on the co-association matrix

  13. Experimental Results:Data Set Classification

  14. Experimental Results:Results on Different Subsets

  15. Experimental Results:Proposed Method versus Fern-Lin

  16. Experimental Results:Selecting a Method vs Selecting the Best Ensemble Members • Which members are selected for final clustering? MSF K-means NMI with P* Only MSF members are selected MSF and K-means member are selected

  17. Experimental Results:How accurate are the selected ensemble members? • x-axis: members in decreasing order of NMI values with P* • y-axis: their correspond NMI values with ground truth labels Selected ensemble members More accurate ensemble members are selected Most similar to P* Most dissimilar to P*

  18. Conclusion • We empirically learned a simple ensemble selection strategy: • First classify an given ensemble as stable or unstable. • Then select a subset according to the classification result. • On separate test data sets, we achieve excellent results: • Some times significantly better than best ensemble member. • Outperforms an existing selection method.

More Related