1 / 66

Advanced Artificial Intelligence

Advanced Artificial Intelligence. Lecture 4: Clustering And Recommender Systems. Outline. Clustering K-Means EM K-Means++ Mean-­Shift Spectral Clustering Recommender Systems Collaborative Filtering. K-Means. Many data points, no labels. Choose a fixed number of clusters

aaronj
Télécharger la présentation

Advanced Artificial Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Artificial Intelligence Lecture 4: Clustering And Recommender Systems

  2. Outline • Clustering • K-Means • EM • K-Means++ • Mean-­Shift • Spectral Clustering • Recommender Systems • Collaborative Filtering

  3. K-Means Many data points, no labels

  4. Choose a fixed number of clusters Choose cluster centers and point-cluster allocations to minimize error can’t do this by exhaustive search, because there are too many possible allocations. Algorithm fix cluster centers; allocate points to closest cluster fix allocation; compute best cluster centers x could be any set of features for which we can compute a distance (careful about scaling) K-Means

  5. K-Means

  6. K-Means * From Marc Pollefeys COMP 256 2003

  7. K-Means • Is an approximation to EM • Model (hypothesis space): Mixture of N Gaussians • Latent variables: Correspondence of data and Gaussians • We notice: • Given the mixture model, it’s easy to calculate the correspondence • Given the correspondence it’s easy to estimate the mixture models

  8. Expectation Maximzation: Idea • Data generated from mixture of Gaussians • Latent variables: Correspondence between Data Items and Gaussians

  9. Generalized K-Means (EM)

  10. Gaussians

  11. ML Fitting Gaussians

  12. E-Step Learning a Gaussian Mixture(with known covariance) M-Step

  13. Expectation Maximization • Converges! • Proof [Neal/Hinton, McLachlan/Krishnan]: • E/M step does not decrease data likelihood • Converges at local minimum or saddle point • But subject to local minima

  14. Practical EM • Number of Clusters unknown • Suffers (badly) from local minima • Algorithm: • Start new cluster center if many points “unexplained” • Kill cluster center that doesn’t contribute • (Use AIC/BIC criterion for all this, if you want to be formal)

  15. K-Means++ • Can we prevent arbitrarily bad local minima? • Randomly choose first center. • Pick new center with prob. proportional to   (Contribution of p to total error) • Repeat until k centers. • Expected error = O(log k) * optimal

  16. K-Means++ • In the second step ,Calculate the distance between each data point and the nearest seed point (cluster center) in turn ,A set D of D(1), D(2), ..., D(n) is obtained in turn, where n represents the size of the data set 。 • In D, in order to avoid noise, you cannot directly select the element with the largest value. You should select the element with the larger value and then use the corresponding data point as the seed point. • Take a random value and use the weight to calculate the next "seed point".

  17. K-Means++ • There are 8 samples in the data set. • The distribution and corresponding serial numbers are shown below.

  18. K-Means++ • Suppose Point 6 is selected as the first initial cluster center, then the probability of D (x) of each sample and the probability of selecting the second cluster center at step 2 are shown in the following table: • P (x) is the probability of each sample being selected as the next cluster center. The Sum of the last row is the cumulative sum of probability P (x), which is used to select second clustering centers by the roulette method.

  19. Mean-­Shift • Given n sample points of d-dimensional space Rd, i=1,...,n, optionally a point x in space, then the basic form of the Mean Shift vector is defined as : • Where Sh represents the data point where the distance from the point of the data set to x is less than the radius h of the sphere • K indicates that in the n sample point Xi, there are k points falling into the Sk area. • By calculating the drift vector, and then updating the position of the center x of the ball, the update formula is :x = x + Mh • Solve a vector so that the center of the circle moves in the direction of the highest density of the data set.

  20. Mean-­Shift Source of slide: http://vision.stanford.edu/teaching/cs131_fall1314_nope/lectures/lecture13_kmeans_cs131.pdf

  21. Mean-­Shift Source of slide: http://vision.stanford.edu/teaching/cs131_fall1314_nope/lectures/lecture13_kmeans_cs131.pdf

  22. Mean-­Shift Source of slide: http://vision.stanford.edu/teaching/cs131_fall1314_nope/lectures/lecture13_kmeans_cs131.pdf

  23. Mean-­Shift Source of slide: http://vision.stanford.edu/teaching/cs131_fall1314_nope/lectures/lecture13_kmeans_cs131.pdf

  24. Mean-­Shift Source of slide: http://vision.stanford.edu/teaching/cs131_fall1314_nope/lectures/lecture13_kmeans_cs131.pdf

  25. Mean-­Shift Source of slide: http://vision.stanford.edu/teaching/cs131_fall1314_nope/lectures/lecture13_kmeans_cs131.pdf

  26. Mean-­Shift Source of slide: http://vision.stanford.edu/teaching/cs131_fall1314_nope/lectures/lecture13_kmeans_cs131.pdf

  27. Mean-­Shift Clustering

  28. Mean-­Shift Bandwidth : radius h

  29. Mean-­Shift Bandwidth : radius h

  30. Mean-­Shift Clustering • Cluster: all data points in the attraction basin of a mode • Attraction basin: the region for which all trajectories lead to the same mode Source of slide: http://vision.stanford.edu/teaching/cs131_fall1314_nope/lectures/lecture13_kmeans_cs131.pdf

  31. Spectral Clustering

  32. Spectral Clustering

  33. Spectral Clustering: Overview Data Similarities Block-Detection

  34. Eigenvectors and Blocks • Block matrices have block eigenvectors: • Near-block matrices have near-block eigenvectors: [Ng et al., NIPS 02] l2= 2 l3= 0 l1= 2 l4= 0 eigensolver l3= -0.02 l1= 2.02 l2= 2.02 l4= -0.02 eigensolver

  35. Spectral Space • Can put items into blocks by eigenvectors: • Resulting clusters independent of row ordering: e1 e2 e1 e2 e1 e2 e1 e2

  36. The Spectral Advantage • The key advantage of spectral clustering is the spectral space representation:

  37. Measuring Affinity Intensity Distance Texture

  38. Scale affects affinity

  39. Clustering

  40. Recommender Systems

  41. Collaborative Filtering • Memory-based • This approach uses user rating data to compute the similarity between users or items. Typical examples of this approach are neighborhood-based CF and item-based/user-based top-N recommendation • Model-based • In this approach, models are developed using different algorithms to predict users' rating of unrated items.  Like Matrix Factorization • Hybrid • A number of applications combine the memory-based and the model-based CF algorithms.

  42. Collaborative Filtering • the workflow of a collaborative • A user expresses his or her preferences by rating items (e.g. books, movies or CDs) of the system. These ratings can be viewed as an approximate representation of the user's interest in the corresponding domain. • he system matches this user's ratings against other users' and finds the people with most "similar" tastes. • With similar users, the system recommends items that the similar users have rated highly but not yet being rated by this user (presumably the absence of rating is often considered as the unfamiliarity of an item)

  43. Collaborative Filtering • assumption: users with similar taste in past will have similar taste in future • requires only matrix of ratings →applicable in many domains • widely used in practice

  44. Collaborative Filtering • input: matrix of user-item ratings (with missing values , often very sparse) • output: predictions for missing values

  45. Collaborative Filtering • Consider user   x   • Find set N of other users are similar to x’s rating • Estimate x’s ratings based on ratings of user in N

  46. Similar Users • Jaccard similarity measure • Cosine similarity measure • Pearson correlation coefficient

  47. Collaborative Filtering

  48. Collaborative Filtering

  49. Collaborative Filtering • The user based top-N recommendation algorithm uses a similarity-based vector model to identify the k most similar users to an active user. • After the k most similar users are found, their corresponding user-item matrices are aggregated to identify the set of items to be recommended.

More Related