1 / 25

Isomap Algorithm isomap.stanford/ Yuri Barseghyan Yasser Essiarab

Isomap Algorithm http://isomap.stanford.edu/ Yuri Barseghyan Yasser Essiarab. Linear Methods for Dimensionality Reduction PCA (Principal Component Analysis): rotate data so that principal axes lie in direction of maximum variance

edaily
Télécharger la présentation

Isomap Algorithm isomap.stanford/ Yuri Barseghyan Yasser Essiarab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Isomap Algorithm http://isomap.stanford.edu/ Yuri Barseghyan Yasser Essiarab

  2. Linear Methods for Dimensionality Reduction • PCA (Principal Component Analysis): rotate data so that principal axes lie in direction of maximum variance • MDS (Multi-Dimensional Scaling): find coordinates that best preserve pairwise distances

  3. Limitations of Linear methods • What if the data does not lie within a linear subspace? • Do all convex combinations of the measurements generate plausible data? • Low-dimensional non-linear Manifold embedded in a higher dimensional space http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction1.pdf

  4. Non-linear Dimensionality Reduction • What about data that cannot be described by linear combination of latent variables? • Ex: swiss roll, s-curve • In the end, linear methods do nothing more than “globally transform” (rotate/translate/scale) data. Sometimes need to “unwrap” the data first PCA http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

  5. Non-linear Dimensionality Reduction • Unwrapping the data = “manifold learning” • Assume data can be embedded on a lower-dimensional manifold • Given data set X = {xi}i=1…n, find representation Y = {yi}i=1…n where Y lies on lower-dimensional manifold • Instead of preserving global pairwise distances, non-linear dimensionality reduction tries to preserve only the geometric properties of local neighborhoods

  6. Isometry • From Mathworld: two Riemannian manifolds M and N are isometric if there is a diffeomorphism such that the Riemannian metric from one pulls back to the metric on the other. For a complete Riemannian manifold: d(x, y) = geodesic distance between x and y • Informally, an isometry is a smooth invertible mapping that looks locally like a rotation plus translation • Intuitively, for 2-dimensional case, isometries include whatever physical transformations one can perform on a sheet of paper without introducing tears, holes, or self-intersections

  7. Trustworthiness [2] The trustworthinessquanties how trustworthy is a projection ofa high-dimensional data set onto a low-dimensional space. Specically a projectionis trustworthy if the set of the t nearest neighbors of each data point in the lowdimensionalspace are also close-by in the original space. r(i, j) is the rank of the data point j in the ordering according to the distancefrom i in the original data space Ut(i) denotes the set of those data points that areamong the t-nearest neighbors of the data point i in the low-dimensional space but notin the original space. The maximal value that trustworthiness can take is equal to one.The closer M(t) is to one, the better the low-dimensional space describes the originaldata.

  8. Several methods to learn a manifold • Two to start: • Isomap [Tenenbaum 2000] • Locally Linear Embeddings (LLE) [Roweis and Saul, 2000] • Recently: • Semidefinite Embeddings (SDE) [Weinberger and Saul, 2005]

  9. An important observation • Small patches on a non-linear manifold look linear • These locally linear neighborhoods can be defined in two ways • k-nearest neighbors: find the k nearest points to a given point, under some metric. Guarantees all items are similarly represented, limits dimension to K-1 • ε-ball: find all points that lie within εof a given point, under some metric. Best if density of items is high and every point has a sufficient number of neighbors http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction1.pdf

  10. Small Euclidean distance Isomap • Find coordinates on lower-dimensional manifold that preserve geodesic distances instead of Euclidean distances • Key Observation: If goal is to discover underlying manifold, geodesic distance makes more sense than Euclidean Large geodesic distance http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction1.pdf

  11. Calculating geodesic distance • We know how to calculate Euclidean distance • Locally linear neighborhoods mean that we can approximate geodesic distance within a neighborhood using Euclidean distance • A graph is constructed byconnecting each point to itsK nearest neighbours. • Approximate geodesic distances are calculated by finding the length of the shortest path in the graph between points • Use Dijkstra’s algorithm to fill in remaining distances http://www.maths.lth.se/bioinformatics/calendar/20040527/NilssonJ_KI_27maj04.pdf

  12. Dijkstra’s Algorithm • Greedy breadth-first algorithm to compute shortest path from one point to all other points http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

  13. Isomap Algorithm • Compute fully-connected neighborhood of points for each item • Can be k nearest neighbors or ε-ball • Calculate pairwise Euclidean distances within each neighborhood • Use Dijkstra’s Algorithm to compute shortest path from each point to non-neighboring points • Run MDS on resulting distance matrix http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

  14. Isomap Algorithm [3]

  15. Time Complexity of Algorithm http://www.cs.rutgers.edu/~elgammal/classes/cs536/lectures/NLDR.pdf

  16. Isomap Results Find a 2D embedding of the 3D S-curve http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

  17. Residual Fitting Error Plotting eigenvalues from MDS will tell you dimensionality of your data http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

  18. Neighborhood Graph http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

  19. More Isomap Results http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

  20. Results on projecting the face datasettotwo dimensions(Trustworthiness−Continuity) [1]

  21. More Isomap Results http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

  22. Isomap Failures • Isomap has problems on closed manifolds of arbitrary topology http://www.cs.unc.edu/Courses/comp290-090-s06/Lecturenotes/DimReduction2.pdf

  23. Isomap: Advantages • Nonlinear • Globally optimal • Still produces globally optimal low-dimensional Euclideanrepresentation even though input space is highly folded,twisted, or curved. • Guarantee asymptotically to recover the truedimensionality.

  24. Isomap: Disadvantages • Guaranteed asymptotically to recover geometricstructure of nonlinear manifolds • As N increases, pairwise distances provide betterapproximations to geodesics by “hugging surface”more closely • Graph discreteness overestimates dM(i,j) • K must be high to avoid “linear shortcuts” nearregions of high surface curvature • Mapping novel test images to manifold space

  25. Literature [1] Jarkko Venna and Samuel Kaski, Nonlinear dimensionality reduction viewed as information retrieval, NIPS' 2006 workshop on Novel Applications of Dimensionality Reduction, 9 Dec 2006 http://www.cis.hut.fi/projects/mi/papers/nips06_nldrws_poster.pdf [2] Claudio Varini, Visual Exploration of Multivariate Data in Breast Cancer by Dimensional Reduction, March 2006 http://deposit.ddb.de/cgi-bin/dokserv?idn=98073472x&dok_var=d1&dok_ext=pdf&filename=98073472x.pdf [3] YimingWu, Kap Luk Chan, An Extended Isomap Algorithm for Learning Multi-Class Manifold, Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference, Aug. 2004 http://ww2.cs.fsu.edu/~ywu/PDF-files/ICMLC2004.pdf

More Related