1 / 39

Pattern Recognition: Statistical and Neural

Nanjing University of Science & Technology. Pattern Recognition: Statistical and Neural. Lonnie C. Ludeman Lecture 26 Nov 4, 2005. Lecture 26 Topics. General Concept of Clustering Basic problems in determining clusters Definition of distance functions between clusters

fahey
Télécharger la présentation

Pattern Recognition: Statistical and Neural

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nanjing University of Science & Technology Pattern Recognition:Statistical and Neural Lonnie C. Ludeman Lecture 26 Nov 4, 2005

  2. Lecture 26 Topics • General Concept of Clustering • Basic problems in determining clusters • Definition of distance functions between clusters • Introduce the K-Means Clustering Algorithm Example 1 Example 2

  3. Clusteringis the art of grouping together pattern vectors that in some sense belong together because they have similar characteristics and are different from other pattern vectors. In the most general problem the number of clusters or subgroups is unknown as are the properties that make them similar.

  4. Question: How do we start the process of finding clusters and identifying similarities??? Answer: First realize that clustering is an art and there is no correct answer only feasible alternatives. Secondexplore structures of data, similarity measures, and limitations of various clustering procedures

  5. Formalization of the Problem of Clustering Given a set S of NS n-dimensional pattern vectors: S= { xj ; j =1, 2, ... , NS } Clustering is the process of partitioning S into M subsets, Clk , k=1, 2, ... , M called clusters that satisfy the following conditions.

  6. K ∩ Clk k = 1 1. The members in each subset are in some sense similar and not similar to members in the other subsets. 2. Clk≠ Φ Not empty 3. Clk∩Clj≠ΦPairwise disjoint = S Exhaustive 4. Φ is the Null Set

  7. Illustration of Clusters and Cluster centers

  8. Will now look at two examples that illustrate problems in performing meaningful clustering: Example 1: Problems with scaling Example 2: The nonuniqueness of results

  9. Example 1: Given the data below, obtained by measuring the weight and diameter of 4 large foam balls labeled a, b, c, and d. Find two clusters from the set { a, b, c, d }

  10. Solution: The plot of the points in the 2-dimensional pattern space is given below

  11. Solution: The plot of the points in the 2-dimensional pattern space is given below By closeness in pattern space select Cl1 = { a,c } Cl2 = { b,d }

  12. The plot of the same points in the 2-dimensional pattern space with Diameter shown in inches rather than feet (different scale) is given below

  13. The plot of the same points in the 2-dimensional pattern space with Diameter shown in inches rather than feet (different scale) is given below By closeness in pattern space select Cl1 = { a,b } Cl2 = { c,d }

  14. Which set of clusters is the correct answer ???

  15. Which set of clusters is the correct answer ??? Cl1 = { a,c } Cl2 = { b,d } #1: Measured in feet Cl1 = { a,b } Cl2 = { c,d } #2: Measured in inches

  16. Which set of clusters is the correct answer ??? Cl1 = { a,c } Cl2 = { b,d } #1: Measured in feet Cl1 = { a,b } Cl2 = { c,d } #2: Measured in inches Other measurement Units #3: Cl1 = { a,d } Cl2 = { b,c }

  17. Which set of clusters is the correct answer ??? Cl1 = { a,c } Cl2 = { b,d } #1: Measured in feet Cl1 = { a,b } Cl2 = { c,d } #2: Measured in inches Other measurement Units #3: Cl1 = { a,d } Cl2 = { b,c } #4: None of the above

  18. Which set of clusters is the correct answer ??? Cl1 = { a,c } Cl2 = { b,d } #1: Measured in feet Cl1 = { a,b } Cl2 = { c,d } #2: Measured in inches Other measurement Units #3: Cl1 = { a,d } Cl2 = { b,c } #4: None of the above #5: All of the above

  19. Answer: There is no correct answer, the clusters provide us with different interpretations of the data where the closeness of patterns is measured with different definitions of similarity.

  20. One approach is to solve the scaling problem is to normalize each dimension separately if they represent different properties like weight and diameter. For our problem we have Diameter 1 Weight 1

  21. Example 2. Given a Standard(USA) deck of 52 playing cards. Each card is specified by the pair of values: (denomination, suit) where denomination is from { 2, 3, ..., 10, J, Q, K, A } and suit is from {  ,  , ,  } Find a reasonable clustering of the data.

  22.     Given Patterns

  23. Solution 1:    

  24. Solution 2:    

  25. Solution 3:    

  26. Solution 4:    

  27. Solution 5:    

  28. Solution 6:Another Choice for 26 clusters       Solution 7:Still Another Choice for 26 clusters     

  29. Concentrate now on quantitative data and examine measures of similaritybetween pattern samples and clusters Euclidean Distance between two pattern vectorsx and y The smaller the distance the larger the similarity

  30. Measures of Distance between two pattern Classes Si and Sj 1. minimum distance 2. average distance

  31. 3. between means Where 4. between medians

  32. 5. maximum distance Interpretation of dmax , dmean, dmin

  33. Measure of Performance for Clustering Overall performance measure J for a given set of clusters Clkfor k =1, 2, ... , K where the mean of each cluster is Mk i k i k

  34. If K=NS,the number of samples, then the cluster centers equal the sample in the cluster and the performance would be 0. If K=1 then all samples are in just one cluster and J would be maximum. 34 There is no useful information in either one of these conditions!

  35. Methods for Clustering Quantitative Data * 1. K-Means Clustering Algorithm 2. Hierarchical Clustering Algorithm 3. ISODATA Clustering Algorithm 4. Fuzzy Clustering Algorithm * Just introduce in this lecture, details in following lecture

  36. K-Means Clustering Algorithm: Basic Procedure Randomly Select K cluster centers from Pattern Space Distribute set of patterns to the cluster center using minimum distance Compute new Cluster centers for each cluster Continue this process until the cluster centers do not change.

  37. Flow Diagram for K-Means Algorithm

  38. Summary Lecture 26 • Presented General Concept of Clustering • Discussed Basic problems in determining clusters by presenting • Gave the Definition of distance functions between clusters • Introduced the K-Means Clustering Algorithm Example 1 Example 2

  39. End of Lecture 26

More Related