1 / 93

Unsupervised Learning

Unsupervised Learning. Clustering Unsupervised classification, that is, without the class attribute Want to discover the classes Association Rule Discovery Discover correlation. The Clustering Process. Pattern representation Definition of pattern proximity measure Clustering

peterss
Télécharger la présentation

Unsupervised Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Learning • Clustering • Unsupervised classification, that is, without the class attribute • Want to discover the classes • Association Rule Discovery • Discover correlation Data Mining and Knowledge Discovery

  2. The Clustering Process • Pattern representation • Definition of pattern proximity measure • Clustering • Data abstraction • Cluster validation Data Mining and Knowledge Discovery

  3. Pattern Representation • Number of classes • Number of available patterns • Circles, ellipses, squares, etc. • Feature selection • Can we use wrappers and filters? • Feature extraction • Produce new features • E.g., principle component analysis (PCA) Data Mining and Knowledge Discovery

  4. Pattern Proximity • Want clusters of instances that are similar to each other but dissimilar to others • Need a similarity measure • Continuous case • Euclidean measure (compact isolated clusters) • The squared Mahalanobis distance alleviates problems with correlation • Many more measures Data Mining and Knowledge Discovery

  5. Pattern Proximity • Nominal attributes Data Mining and Knowledge Discovery

  6. Clustering Techniques Clustering Hierarchical Partitional Single Link Complete Link Square Error Mixture Maximization Expectation Maximization CobWeb K-means Data Mining and Knowledge Discovery

  7. Technique Characteristics • Agglomerative vs Divisive • Agglomerative: each instance is its own cluster and the algorithm merges clusters • Divisive: begins with all instances in one cluster and divides it up • Hard vs Fuzzy • Hard clustering assigns each instance to one cluster whereas in fuzzy clustering assigns degree of membership Data Mining and Knowledge Discovery

  8. More Characteristics • Monothetic vs Polythetic • Polythetic: all attributes are used simultaneously, e.g., to calculate distance (most algorithms) • Monothetic: attributes are considered one at a time • Incremental vs Non-Incremental • With large data sets it may be necessary to consider only part of the data at a time (data mining) • Incremental works instance by instance Data Mining and Knowledge Discovery

  9. Hierarchical Clustering Dendrogram S imilarity F G C B D E A A B C D E F G Data Mining and Knowledge Discovery

  10. Hierarchical Algorithms • Single-link • Distance between two clusters set equal to the minimum of distances between all instances • More versatile • Produces (sometimes too) elongated clusters • Complete-link • Distance between two clusters set equal to maximum of all distances between instances in the clusters • Tightly bound, compact clusters • Often more useful in practice Data Mining and Knowledge Discovery

  11. 2 1 2 2 2 2 1 2 1 1 1 2 1 * * * * * * * * * 2 2 2 1 1 1 2 2 2 1 2 1 2 1 2 2 2 2 1 2 1 1 1 2 1 * * * * * * * * * 2 2 2 1 1 1 2 2 2 1 2 1 Example: Clusters Found Single-Link Complete-Link Data Mining and Knowledge Discovery

  12. Partitional Clustering • Output a single partition of the data into clusters • Good for large data sets • Determining the number of clusters is a major challenge Data Mining and Knowledge Discovery

  13. K-Means Predetermined number of clusters Start with seed clusters of one element Seeds Data Mining and Knowledge Discovery

  14. Assign Instances to Clusters Data Mining and Knowledge Discovery

  15. Find New Centroids Data Mining and Knowledge Discovery

  16. New Clusters Data Mining and Knowledge Discovery

  17. Discussion: k-means • Applicable to fairly large data sets • Sensitive to initial centers • Use other heuristics to find good initial centers • Converges to a local optimum • Specifying the number of centers very subjective Data Mining and Knowledge Discovery

  18. Clustering in Weka • Clustering algorithms in Weka • K-Means • Expectation Maximization (EM) • Cobweb • hierarchical, incremental, and agglomerative Data Mining and Knowledge Discovery

  19. CobWeb • Algorithm (main) characteristics: • Hierarchical and incremental • Uses category utility • Why divide by k? The k clusters All possible values for attribute ai Data Mining and Knowledge Discovery

  20. Category Utility • If each instance in its own cluster • Category utility function becomes • Without k it would always be best for each instance to have its own cluster, overfitting! Data Mining and Knowledge Discovery

  21. The Weather Problem Data Mining and Knowledge Discovery

  22. Weather Data (without Play) • Label instances: a,b,….,n Add another instance in its own cluster Start by putting the first instance in its own cluster a a b Data Mining and Knowledge Discovery

  23. Adding the Third Instance Evaluate the category utility of adding the instance to one of the two clusters versus adding it as its own cluster b a a c b a c b c Highest utility Data Mining and Knowledge Discovery

  24. Adding Instance f First instance not to get its own cluster: a b c d e f Look at the instances: Rainy Cool Normal FALSE Rainy Cool Normal TRUE Quite similar! Data Mining and Knowledge Discovery

  25. Add Instance g Look at the instances: E) Rainy Cool Normal FALSE F) Rainy Cool Normal TRUE G) Overcast Cool Normal TRUE a b c d e f g Data Mining and Knowledge Discovery

  26. Add Instance h Look at the instances: Runner up A) Sunny Hot High FALSE D) Rainy Mild High FALSE H) Sunny Mild High FALSE Rearrange: Best matching node Merged into a single cluster before h is added b c e f g a d h (Splitting is also possible) Data Mining and Knowledge Discovery

  27. g f j m n a d h c l e i b k Final Hierarchy What next? Data Mining and Knowledge Discovery

  28. g f j m n a d h c l e i b k Dendrogram  Clusters What do a, b, c, d, h, k, and l have in common? Data Mining and Knowledge Discovery

  29. Numerical Attributes • Assume normal distribution • Problems with zero variance! • The acuity parameter imposes a minimum variance Data Mining and Knowledge Discovery

  30. Hierarchy Size (Scalability) • May create very large hierarchy • The cutoff parameter is uses to suppress growth • If cut node off. Data Mining and Knowledge Discovery

  31. Discussion • Advantages • Incremental  scales to large number of instances • Cutoff  limits size of hierarchy • Handles mixed attributes • Disadvantages • Incremental sensitive to order of instances? • Arbitrary choice of parameters: • divide by k, • artificial minimum value for variance of numeric attributes, • ad hoc cutoff value Data Mining and Knowledge Discovery

  32. Probabilistic Perspective • Most likely set of clusters given data • Probability of each instance belonging to a cluster • Assumption: instances are drawn from one of several distributions • Goal: estimate the parameters of these distributions • Usually: assume distributions are normal Data Mining and Knowledge Discovery

  33. Mixture Resolution • Mixture: set of k probability distributions • Represent the k clusters • Probabilities that an instance takes certain attribute values given it is in the cluster • What is the probability an instance belongs to a cluster (or a distribution) Data Mining and Knowledge Discovery

  34. Cluster B Cluster A Attribute One Numeric Attribute Two cluster mixture model: Given some data, how can you determine the parameters: Data Mining and Knowledge Discovery

  35. Problems • If we knew which instance came from each cluster we could estimate these values • If we knew the parameters we could calculate the probability that an instance belongs to each cluster Data Mining and Knowledge Discovery

  36. EM Algorithm • Expectation Maximization (EM) • Start with initial values for the parameters • Calculate the cluster probabilities for each instance • Re-estimate the values for the parameters • Repeat • General purpose maximum likelihood estimate algorithm for missing data • Can also be used to train Bayesian networks (later) Data Mining and Knowledge Discovery

  37. Beyond Normal Models • More than one class: • Straightforward • More than one numeric attribute • Easy if assume attributes independent • If dependent attributes, treat them jointly using the bivariate normal • Nominal attributes • No more normal distribution! Data Mining and Knowledge Discovery

  38. EM using Weka • Options • numClusters: set number of clusters. • Default = -1 selects it automatically • maxIterations: maximum number of iterations • seed -- random number seed • minStdDev -- set minimum allowable standard deviation Data Mining and Knowledge Discovery

  39. Other Clustering • Artificial Neural Networks (ANN) • Random search • Genetic Algorithms (GA) • GA used to find initial centroids for k-means • Simulated Annealing (SA) • Tabu Search (TS) • Support Vector Machines (SVM) • Will discuss GA and SVM later Data Mining and Knowledge Discovery

  40. Applications • Image segmentation • Object and Character Recognition • Data Mining: • Stand-alone to gain insight into the data • Preprocess before classification that operates on the detected clusters Data Mining and Knowledge Discovery

  41. DM Clustering Challenges • Data mining deals with large databases • Scalability with respect to number of instance • Use a random sample (possible bias) • Dealing with mixed data • Many algorithms only make sense for numeric data • High dimensional problems • Can the algorithm handle many attributes? • How do we interpret a cluster in high dimensions? Data Mining and Knowledge Discovery

  42. Other (General) Challenges • Shape of clusters • Minimum domain knowledge (e.g., knowing the number of clusters) • Noisy data • Insensitivity to instance order • Interpretability and usability Data Mining and Knowledge Discovery

  43. Clustering for DM • Main issue is scalability to large databases • Many algorithms have been developed for scalable clustering: • Partitional methods: CLARA, CLARANS • Hierarchical methods: AGNES, DIANA, BIRCH, CURE, Chameleon Data Mining and Knowledge Discovery

  44. Practical Partitional Clustering Algorithms • Classic k-Means (1967) • Work from 1990 and later: • k-Medoids • Uses the mediod instead of the centroid • Less sensitive to outliers and noise • Computations more costly • PAM (Partitioning Around Mediods) algorithm Data Mining and Knowledge Discovery

  45. Large-Scale Problems • CLARA: Clustering LARge Applications • Select several random samples of instances • Apply PAM to each • Return the best clusters • CLARANS: • Similar to CLARA • Draws samples randomly while searching • More effective than PAM and CLARA Data Mining and Knowledge Discovery

  46. Hierarchical Methods • BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies • Clustering feature: triplet summarizing information about subclusters • Clustering feature tree: height-balanced tree that stores the clustering features Data Mining and Knowledge Discovery

  47. BIRCH Mechanism • Phase I: • Scan database to build an initial CF tree • Multilevel compression of the data • Phase II: • Apply a selected clustering algorithm to the leaf nodes of the CF tree • Has been found to be very scalable Data Mining and Knowledge Discovery

  48. Conclusion • The use of clustering in data mining practice seems to be somewhat limited due to scalability problems • More commonly used unsupervised learning: Association Rule Discovery Data Mining and Knowledge Discovery

  49. Association Rule Discovery • Aims to discovery interesting correlation or other relationships in large databases • Finds a rule of the form if A and B then C and D • Which attributes will be included in the relation is unknown Data Mining and Knowledge Discovery

  50. Mining Association Rules • Similar to classification rules • Use same procedure? • Every attribute is the same • Apply to every possible expression on right hand side • Huge number of rules  Infeasible • Only want rules with high coverage/support Data Mining and Knowledge Discovery

More Related