1 / 30

Unsupervised Machine Learning

Unsupervised Machine Learning. Claudio Duran, Cannistraci lab, BIOTEC Robert Haase, Myers lab, MPI CBG Mahmood Nazari, ABX- Cro / Schroeder lab, BIOTEC Martin Weigert, Myers lab, MPI CBG https://www.biotec.tu-dresden.de/research/cannistraci/lecture.html. Python RECAP.

dinardo
Télécharger la présentation

Unsupervised Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Machine Learning Claudio Duran, Cannistraci lab, BIOTEC Robert Haase, Myers lab, MPI CBG Mahmood Nazari, ABX-Cro / Schroeder lab, BIOTEC Martin Weigert, Myers lab, MPI CBG https://www.biotec.tu-dresden.de/research/cannistraci/lecture.html

  2. Python RECAP How to read a .csv file?

  3. Python RECAP 0 1 2 3 4 5 6 7 8 9 Manipulating pandas dataframes myList = [2,7,3,9,8,1,5,6,0,4] [7,3,9,8,1,5] myList[1:7] = ? [7,9,1] myList[1:7:2] = ? [1,5,6,0,4] myList[5:] = ? Rows Columns X[i,j] =

  4. Today... • What is machine learning? • Where do we find it? • Unsupervised machine learning • Clustering • Dimensionality Reduction • Measurments for performance (Classification) • Exercises • MNIST

  5. Machine learning Use of AI to provide systems the ability to learn from experiences. • Search engines • Anti-spam softwares SPAM • Credit card transactions • Digital cameras (face recognition)

  6. Machine learning • Machine Learning • Supervised • Unsupervised • Reinforced • Regression • Classification • Dimensionality reduction • Clustering • Learning by reward • Labels/response known • Labels/response unknown • GO game

  7. Unsupervised machine learning Informative way to visualize the data? Detect subgroups or patterns among the variables or observations?

  8. Dimensionality Reduction Objective: to reduce the number of variables in consideration How? Most common dimensionality reduction technique: Principal Component Analysis (PCA)

  9. PCA Finds a sequence of linear combinations of the variables that have maximal variance and are mutually uncorrelated.

  10. PCA in python Never forget to import the library You have to create a pca variable with the number of desired components fit_transform is the function to call to apply the PCA transformation

  11. PCA in python

  12. ISOMAP Non-linear dimensionality reduction

  13. ISOMAP Like in PCA 

  14. Clustering Process of grouping a set of similar objects into the same classes How many clusters?

  15. Clustering

  16. K-Means Also known as K-Means, is an algorithm that group objects based on attributes into K number of groups. 1.- Place K centroids in random positions. 2.- Assign point X to the closest cluster (min D). 3.- New centroids are the average of the current point in cluster C 4.- Repeat until convergence (none of clusters assignment changes)

  17. K-Means Fit is the function to apply the clustering method Number of clusters we are looking for “K”

  18. K-Means

  19. K-Means

  20. DBSCAN Also known as Density-Based Spatial Clustering of Applications with Noise Density-based clustering locates regions of high density that are separated from one another by regions of low density Density = number of points within a specified radius (Eps)

  21. DBSCAN Concepts: Core point: point that has more than a specified number of points (MinPts) within Eps Border point: has fewer than MinPts within Eps, but is in the neighborhood of a core point Noise point: any point that is not core point or border point If two core points are in their neighbourhood(within a distance Eps of one another) they are put in the same cluster

  22. DBSCAN Eps and MinPts as input arguments Same function as Kmeans

  23. DBSCAN

  24. DBSCAN

  25. Performance measurements Usually applied in binary classification tasks (supervised learning) -Accuracy -Sensitivity (or recall or true positive rate [TPR]) -Specificity (or true negative rate [TNR]) -Precision (or positive predicted values) -F-score

  26. Performance measurements All positive samples Predicted as positive

  27. Performance measurements Accuracy Sensitivity (recall) Precision Specificity How to check clustering performance? F-score

  28. Exercises!

  29. Exercises MNIST data hand writing 28x28 pixels Gray value scale Transformed into an array of 1x784

  30. Exercises 1.- Load the MNIST data into a dataframe with pandas 2.- Apply Dimensionality Reduction to the data (PCA and ISOMAP) EXTRA: find the MDS dimensionality reduction technique and apply it [Hint: sklearn library, google] 3.- Show both results in a scatter plot and colour the point according to the label (number) [do not forget to add a legend to the plot] 4.- check If the clustering methods are able to retrieve the number of classes present in the data (10 classes). Try both Kmeans and DBSCAN 5.- check the performance of the clustering methods in retrieving the classes by measuring the accuracy (HINT: good predicted samples/total of samples)

More Related