230 likes | 394 Vues
Hybrid Hierarchical Kmeans clustering and DB SCAN. Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas. Outline. Hierarchical Clustering Hybrid Hierarchical Kmeans clustering DBscan. Motivation.
E N D
Hybrid Hierarchical Kmeans clustering and DB SCAN Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas
Outline • Hierarchical Clustering • Hybrid Hierarchical Kmeans clustering • DBscan
Motivation • Among clustering algorithms, Hierarchical and K-means clustering are the two most popular and classic methods. However, both have their innate disadvantages. • K-means clustering requires a specified number of clusters in advance and chooses initial centroids randomly; in other words, you don’t know how to start • Hierarchical clustering is hard to find a place to cut
Hybrid Hierarchical K-means Clustering (HHK) Algorithm • The brief idea is we cluster around half data through Hierarchical clustering and succeed by K-means for the remaining • In order to generate super-rules, we let Hierarchical terminate when it generates the largest number of clusters
Hierarchical Clustering Venn Diagram of Clustered Data Dendrogram From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt
Nearest Neighbor, Level 2, k = 1 clusters. From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt
Outline • Hierarchical Clustering • Hybrid Hierarchical Kmeans clustering • DBscan
Density-Based Clustering Methods • Clustering based on density (local cluster criterion), such as density-connected points • Major features: • Discover clusters of arbitrary shape • Handle noise • One scan • Need density parameters as termination condition
DBscan • Two parameters: • Eps: Maximum radius of the neighbourhood • MinPts: Minimum number of points in an Eps-neighbourhood of that point
DBscan • Directly density-reachable: A point p is directly density-reachable from a point q w.r.t. Eps, MinPts if • p belongs to NEps(q) • core point condition: |NEps (q)| >= MinPts
Outlier Border Eps = 1cm MinPts = 5 Core DBSCAN: Density Based Spatial Clustering of Applications with Noise • Relies on a density-based notion of cluster: A cluster is defined as a maximal set of density-connected points • Discovers clusters of arbitrary shape in spatial databases with noise
DBscan • Arbitrary select a point p • Retrieve all points density-reachable from p w.r.t. Eps and MinPts. • If p is a core point, a cluster is formed. • If p is a border point, no points are density-reachable from p and DBSCAN visits the next point of the database. • Continue the process until all of the points have been processed.