1 / 14

Using Manifold Structure for Partially Labeled Classification

Using Manifold Structure for Partially Labeled Classification. by Belkin and Niyogi, NIPS 2002. Presented by Chunping Wang Machine Learning Group, Duke University November 16, 2007. Outline. Motivations Algorithm Description Theoretical Interpretation Experimental Results Comments.

lynnel
Télécharger la présentation

Using Manifold Structure for Partially Labeled Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Manifold Structure for Partially Labeled Classification by Belkin and Niyogi, NIPS 2002 Presented by Chunping Wang Machine Learning Group, Duke University November 16, 2007

  2. Outline • Motivations • Algorithm Description • Theoretical Interpretation • Experimental Results • Comments

  3. Motivations (1) • Why manifold structure is useful? • Data lies on a lower-dimensional manifold – a dimension reduction is preferable an example: a handwritten digit 0 Usually, dimensionality is the number of pixels, typically very high (256)

  4. Motivations (1) • Why manifold structure is useful? • Data lies on a lower-dimensional manifold – a dimension reduction is preferable an example: a handwritten digit 0 Usually, dimensionality is the number of pixels, typically very high (256) d1 Ideally, 5-dimensional features f1 * d2 f2 *

  5. Motivations (1) • Why manifold structure is useful? • Data lies on a lower-dimensional manifold – a dimension reduction is preferable an example: a handwritten digit 0 Actually, a higher dimensionality, but perhaps no more than several dozens Usually, dimensionality is the number of pixels, typically far higher (256) d1 Ideally, 5-dimensional features f1 * d2 f2 *

  6. Motivations (2) • Why manifold structure is useful? • Data representation in the original space is unsatisfactory labeled unlabeled In the original space 2-d representation with Laplacian Eigenmaps

  7. Algorithm Description (1) Semi-supervised classification k points First s are labeled (s<k) for binary cases • Constructing the Adjacency Graph • if i is among n nearest neighbors of j or j is among n nearest neighbors of i • Eigenfunctions • compute , corresponding to the p smallest eigenvelues • for the graph Laplacian L = D-W,

  8. Algorithm Description (2) Semi-supervised classification k points First s are labeled (s<k) for binary cases • Building the classifier • minimize the error function over the space of coefficients a • the solution is • Classifying unlabeled points (i >s)

  9. Theoretical Interpretation (1) For a manifold , the eigenfunctions of its Laplacian form a basis for the Hilbert space , i.e., any function can be written as with eigenfunctions satisfying The simplest nontrivial example: the manifold is a unit circle S1 Fourier series

  10. Theoretical Interpretation (2) Smoothness measure S: a small S means “smooth” For unit circle S1 Generally Smaller eigenvalues correspond to smoother eigenfunctions (lower frequency) is a constant function In terms of the smoothest p eigenfunctions, the approximation of an arbitrary function

  11. Theoretical Interpretation (3) Back to our problem with finite number of points The solution of a discrete version For binary classification, the alphabet of the function f only contains two possible values. For M-ary cases, the only difference is the number of possible values is more than two.

  12. Results (1) Handwritten Digit Recognition (MNIST data set) 60,000 28-by-28 gray images (the first 100 principal components are used) p=20% k

  13. Results (2) Text Classification (20 Newsgroups data set) 19,935 vectors with dimensionality of 6000 p=20% k

  14. Comments • This semi-supervised algorithm essentially converts the original problem to a linear regression problem in a new space with lower dimensionality. • The approach to solve this linear regression problem is the standard least square estimation. • Only n nearest neighbors are considered for each data point, thus the computation for eigen-decomposition is reduced. • Little additional computation is expended after dimensionality reduction. • More comments ……

More Related