220 likes | 348 Vues
This framework outlines the concept of Nonlinear Dimensionality Reduction (NLDR) and its significance in machine learning. Key methods such as ISOMAP and Locally Linear Embedding (LLE) are discussed, particularly in the context of gait analysis and visual perception. NLDR techniques enable the discovery of low-dimensional representations of high-dimensional data, effectively capturing the intrinsic structure of complex datasets. By contrasting linear approaches like PCA and MDS with non-linear methods, we highlight the advantages, such as global optimality and improved geometric structure recovery.
E N D
Nonlinear Dimensionality Reduction Frameworks Rong Xu Chan su Lee
Outline • Intuition of Nonlinear Dimensionality Reduction(NLDR) • ISOMAP • LLE • NLDR in Gait Analysis
Brain Representation • Every pixel? • Or perceptually meaningful structure? • Up-down pose • Left-right pose • Lighting direction So, your brain successfully reduced the high-dimensional inputs to an intrinsically 3-dimensional manifold!
Manifold Learning • A manifold is a topological space which is locally Euclidean • An example of nonlinear manifold:
Discover low dimensional representations (smooth manifold) for data in high dimension. Linear approaches(PCA, MDS) vs Non-linear approaches (ISOMAP, LLE) Manifold Learning latent Y X observed
Linear Approach- PCA • PCA Finds subspace linear projections of input data.
Linear Approach- MDS • MDS takes a matrix of pairwise distances and gives a mapping to Rd. It finds an embedding that preserves the interpoint distances, equivalent to PCA when those distance are Euclidean. • BUT! PCA and MDS both fail to do embedding with nonlinear data, like swiss roll.
Constructing neighbourhood graph G For each pair of points in G, Computing shortest path distances ---- geodesic distances. Use Classical MDS with geodesic distances. Euclidean distance Geodesic distance Nonlinear Approaches- ISOMAP Josh. Tenenbaum, Vin de Silva, John langford 2000
Sample points with Swiss Roll • Altogether there are 20,000 points in the “Swiss roll” data set. We sample 1000 out of 20,000.
Construct neighborhood graph G K- nearest neighborhood (K=7) DG is 1000 by 1000 (Euclidean) distance matrix of two neighbors (figure A)
Compute all-points shortest path in G Now DG is 1000 by 1000 geodesic distance matrix of two arbitrary points along the manifold(figure B)
Use MDS to embed graph in Rd • Find a d-dimensional Euclidean space Y(Figure c) to minimize the cost function:
Linear Approach-classical MDS • Theorem: For any squared distance matrix ,there exists of points xi and,xj, such that • So
Solution • Y lies in Rd and consists of N points correspondent to the N original points in input space.
Isomap: Advantages • Nonlinear • Globally optimal • Still produces globally optimal low-dimensional Euclidean representation even though input space is highly folded, twisted, or curved. • Guarantee asymptotically to recover the true dimensionality.
Isomap: Disadvantages • May not be stable, dependent on topology of data • Guaranteed asymptotically to recover geometric structure of nonlinear manifolds • As N increases, pairwise distances provide better approximations to geodesics, but cost more computation • If N is small, geodesic distances will be very inaccurate.