1 / 22

Data Mining Techniques Clustering

Data Mining Techniques Clustering. Purpose. In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set of objects is partitioned into several clusters

moe
Télécharger la présentation

Data Mining Techniques Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Techniques Clustering

  2. Purpose • In clustering analysis, there is no pre-classified data • Instead, clustering analysis is a process where a set of objects is partitioned into several clusters • All members in one cluster are similar to each other and different from the members of other clusters, according to some similarity metric (e.g., the opposite of distance between objects)

  3. Cluster Analysis Cluster Y (Age) Customer (Object) X (Income) Variables

  4. Cluster Analysis n objetcs p variables Data Matrix Dissimilarity Matrix (nn)

  5. Attribute Types Involved in Cluster Analysis • Interval Variables • An interval variable contains continuous measurements (e.g., height, weight, temperature, cost, etc.) which follow a linear scale • It is essential that intervals keep the same importance throughout the scale • Nominal Variables • A nominal variable takes on more than two states. For example, the eye color of a person can be blue, brown, green or grey eyes • These states may be coded as 1, 2, ..., M, however their order and the interval between any two states do not have any meaning

  6. Attribute Types Involved in Cluster Analysis • Ordinal Variables • An ordinal variable takes on more than two states. For example, you may ask someone to convey his/her appreciation of some paintings in terms of the following categories: 1=detest, 2=dislike, 3=indifferent, 4=like and 5=admire • In an ordinal variable, their states are ordered in a meaningful sequence. However, the interval between any two consecutive states are not equally distanced • Binary Variables • Binary variables have only two possible states. For example, the gender of a person is either female or male

  7. Dissimilarity (Distance) Measure

  8. Dissimilarity (Distance) Measure

  9. Dissimilarity (Distance) Measure

  10. Dissimilarity (Distance) Measure

  11. Dissimilarity (Distance) Measure

  12. Dissimilarity (Distance) Measure

  13. Dissimilarity (Distance) Measure

  14. Dissimilarity (Distance) Measure

  15. Dissimilarity (Distance) Measure

  16. Categorization of Clustering Methods • Exclusive vs. Non-Exclusive (Overlapping) • Hierarchical Methods vs. Partitioning Methods • Hierarchical Methods • Single Link Method • Complete Link Method • Partitioning Methods • Kohonen Self-Organizing Feature Maps • K-Means Methods • K-Medoids Methods (PAM, CLARA, CLARANS) • Density-Based Methods • …

  17. Hierarchical Methods Dissimilarity Matrix (55)

  18. K-Means Methods

  19. K-Means Methods

  20. K-Means Methods

  21. K-Means Methods Sensitive to Outlier!

  22. Exercise 7 Number of clusters = 2 Using Single Link, Complete Link and K-Means to cluster the following data:

More Related