1 / 29

Mean-shift outlier detection

Mean-shift outlier detection. Jiawei Yang Susanto Rahardja Pasi Fränti. 18.11.2018. Clustering. Clustering with noisy data. How to deal with outliers in clustering. Approch 1:. Cluster. Approch 2:. Outlier removal. Cluster. Approch 3:. Outlier removal. Cluster.

kristaw
Télécharger la présentation

Mean-shift outlier detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mean-shift outlier detection Jiawei Yang Susanto Rahardja Pasi Fränti 18.11.2018

  2. Clustering

  3. Clustering with noisy data

  4. How to deal with outliers in clustering Approch 1: Cluster Approch 2: Outlier removal Cluster Approch 3: Outlier removal Cluster Mean-shift approch: Meanshift Cluster

  5. Mean-shift

  6. K-nearest neighbors k=4 K-neighborhood Point

  7. Expected result Mean of neighbors Point

  8. Result with noise point Mean of neighbors Noise point K-neighborhood

  9. Part I: Noise removal Fränti and Yang, "Medoid-shift noise removal to improve clustering", Int. Conf. Artificial Intelligence and Soft Computing (CAISC), June 2018.

  10. Mean-shift process Move points to the means of their neighbors

  11. Mean or Medoid? Mean-shift Medoid-shift

  12. Medoid-shift algorithm Fränti and Yang, "Medoid-shift noise removal to improve clustering", Int. Conf. Artificial Intelligence and Soft Computing (CAISC), June 2018. • REPEAT 3 TIMES • Calculate kNN(x) • Calculate medoid M of the neighbors • Replace point x by the medoid M

  13. Iterative processes

  14. Effect on clustering result Noisy original Iteration 1 CI=4 CI=4 Iteration 2 Iteration 3 CI=3 CI=0

  15. Experiments

  16. Datasets P. Fränti and S. Sieranoja, "K-means properties on six clustering benchmark datasets", Applied Intelligence, 2018. k=35 k=20 k=50 N=5000 N=5000 k=15 k=15 N=5000 N=5000 k=15 k=15

  17. Noise types Random noise Random noise S1 S3 S3 S1 Data-dependent noise Data-dependent noise

  18. Results for noise type 1 Centroid Index (CI) CI reduces from 5.1 to 1.4 KM: K-means with random initialization RS: P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 5:13, 1-29, 2018.

  19. Results for noise type 2 Centroid Index (CI) CI reduces from 5.0 to 1.1 KM: K-means with random initialization RS: P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 5:13, 1-29, 2018.

  20. Part II: Outlier detection Yang, Rahardja, Fränti, "Mean-shift outlier detection", Int. Conf. Fuzzy Systems and Data Mining (FSDM), November 2018.

  21. Outlier scores D=|x-y| 203 12 k=4

  22. Mean-shift for Outlier Detection • Algorithm: MOD • Calculate Mean-shift y for every point x • Outlier score Di = |yi - xi| • SD = Standard deviation of all Di • IF Di > SD THEN xi is OUTLIER

  23. Result after three iterations x D=|x-y3| y3 y1 y2

  24. Outlier scores D=|x-y| 69 203 65 36 13 64 26 44 28 14 40 30 120 187 12 4 46 27 16 58 118 83 95 SD=52

  25. Distribution of the scores Outliers Normal points

  26. Experimental comparisons

  27. Results for Noise type 1 A priori noise level 7% Automatically estimated (SD)

  28. Results for Noise type 2 A priori noise level 7% Automatically estimated (SD)

  29. Conclusions • Why to use: • Simple but effective! • How it performs: • Outperforms existing methods! • Usefulness: • No threshold parameter needed! 98-99%

More Related