html5-img
1 / 12

Boosting Algorithm for Clustering

Boosting Algorithm for Clustering. Kuan-ming Lin Dec. 1, 2005. Agenda. Motivation Changes from Adaboost to boost-clustering The boost-clustering algorithm Examples Discussions. Motivation: want to improve “ weak clustering algorithms ”. Most clustering algorithms favor certain shapes

garretta
Télécharger la présentation

Boosting Algorithm for Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Boosting Algorithm for Clustering Kuan-ming Lin Dec. 1, 2005

  2. Agenda • Motivation • Changes from Adaboost to boost-clustering • The boost-clustering algorithm • Examples • Discussions

  3. Motivation: want to improve “weak clustering algorithms” • Most clustering algorithms favor certain shapes • e.g. k-means performs well for spherical shape • We want to generalize them to fit more complex shapes • Though lack labels, some instances seem harder to be clustered than others • Learning from these instances might reinforce the clustering algorithm

  4. Boost-clustering: counterpart of Adaboost, but more issues… • Essence in Adaboost • Identify instances not learned well • Add weights to them, re-learn the weighted input • Output a combination of all learners • Analogy in boost-clustering • Identify instances not clustered well • how to define “well”? • Add weights to them, re-run clustering algorithm • Output a combination of all clustering results • clusters not consistently labeled -- how to combine different results?

  5. A solution • In Pattern Recognition Letter ‘04 • Frossyniotis, Likas, Stafylopatis. “A clustering method based on boosting” • Fix the number of clusters to facilitate cluster combination • Need a soft clustering where membership degrees of each instance to all clusters are generated • Define pseudoloss to measure wellness of clusters • Cannot proof effectiveness mathematically • No such “error bound” for clustering problem

  6. The algorithm Initialize weight wi = 1/N for all instances i For t = 1 to T do Let x’ = bootstrap according to probability w Call WeakCluster(x’) to generate cluster set C Get membership degree hi, c for all (i, C) Renumber cluster index -- by fractions of shared instances with old clusters

  7. The algorithm continued Define pseudoloss pi for each instance • Minmax: 1-maxChi,c+minChi,c • Entropy: –ΣC(hi,c log hi,c) Set ε= Σi(wi* pi)/2, α=log((1-ε)/ε) Set new weight wt+1 = norm.(wt*exp(-α*p))) Set new ht+1 = Σt(αt ht) Return clusters according to hT

  8. Example: the effect of weighting (1) iteration=0 iteration=1

  9. Example: the effect of weighting (2) iteration=5 iteration=15 (stopping criteria) • Problem: boundary instances form a cluster

  10. Banana example • Can loosen shape restriction of k-means • Not quite: here four clusters needed k-means k-means + boost-clustering

  11. Discussions • Performance only assessed by experiments • For easy cases, not much improvement • Could be worse due to overemphasis on boundary • Some benefits for irregular shapes • Can build more complex clusters • For really hard cases, still limited by the nature of clustering algorithm • e. g. can’t make k-means learn concentric circles

  12. Thank you http://www.lvcandymania.com/

More Related