1 / 37

Semi-Supervised Learning

Semi-Supervised Learning. D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf Presents: Tal Babaioff. Semi Supervised Learning. Use small number of labeled data to label large amount of cheap unlabeled data. Basic idea: similar examples should be given the same classification.

cili
Télécharger la présentation

Semi-Supervised Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf Presents: Tal Babaioff

  2. Semi Supervised Learning • Use small number of labeled data to label large amount of cheap unlabeled data. • Basic idea: similar examples should be given the same classification. • Typical example : web page classification: unlimited amount of cheap unlabeled data, while labeling is expensive. Semi-Supervised Learning

  3. The Cluster Assumption • The basic assumption of most Semi-Supervised learning algorithms: Two points that are connected by a path going through high density regions should have the same label. Semi-Supervised Learning

  4. Example Semi-Supervised Learning

  5. Basic Approaches • Using a weighted graph with weights representing point similarity: • K nearest neighbors – the most naive approach. • Random walk on graph: a particle start from unlabeled node i and move to node j with probability Pij, The walk continues until the particle hits a labeled node. The classification of node i is based on the label with maximum probability to hit. Semi-Supervised Learning

  6. Basic Approaches • An electrical network: lets connect all the points labeled 1 to a positive voltage source, and all points labeled 0 to a negative one. The graph edges are resistors with conductance W. Each unlabeled point classification will be determined from the amount of voltage in the complete electricnetwork. Semi-Supervised Learning

  7. Other Approaches • Harmonic energy minimization: use a Gaussian field over a continuous state space, with weights given as a similarity function between points. Semi-Supervised Learning

  8. The Consistency Assumption • Points in the same local high density region are more similar to each other (and thus likely to have the same label) then to points outside this region (local consistency). • Points on the same global structure (a cluster or a manifold) are more similar to each other than to points outside of this structure (global consistency). Semi-Supervised Learning

  9. Consistency Assumption Example Semi-Supervised Learning

  10. Consistency Assumption Example Semi-Supervised Learning

  11. Formal Representation • X = {x1..xl,xl+1..xn}  Rm • Label set L = {1,..c} • The first l points have labeled yi {1,..c} • For points with i>l yi is unknown. • The error is checked on the unlabeled examples only. Semi-Supervised Learning

  12. Basic Ideas For The Algorithm • Define a similarity function that changes slowly locally in high density regions and changes globally on the manifold which the data points lie. • Define an activation network represented as a graph with weights determined by the similarity of each two points. Semi-Supervised Learning

  13. Basic Ideas For The Algorithm • Use the labeled points as sources to pump the different classes labels via the graph, and use the new labeled points as additional source until a stable stage has been reached. • The label of each unlabeled point is set to be the class of which it has received most information during the iteration process. Semi-Supervised Learning

  14. Algorithm : Data Structure • Given a set of points: X = {x1..xl,xl+1..xn} • The first l points have labeled Yi {1,..c} the rest are unlabeled. • The classification will be presented on an [n x c] non negative matrix F. The classification of point xi will be yi = argmax j<c Fij. Let YF be a [n x c] matrix with elements Yij =1 if point i has a label yi = j or 0 otherwise. Semi-Supervised Learning

  15. The Consistency Algorithm • Form the affinity matrix W defined by Wij= exp(-||xi-xj||2 /22) if i j and Wii= 0. • Compute the matrix S defined by S = D-½ W D- ½ D is a diagonal matrix with its (i,i) element equal to the sum of the i-th row of W. The eigenvalues of S represents the spectral clusters of the data. Semi-Supervised Learning

  16. The Consistency Algorithm • Iterate F(t+1) = SF(t) + (1-)Y until convergence.(0, 1). • Let F* denote the limit of the sequence {F(t)}. Label the unlabeled point xi by yi = argmax j≤c F*ij Semi-Supervised Learning

  17. Consistency Algorithm – Convergence • Show the algorithm convergence to: F* = (1-)(I -S)-1Y • Without loss of generality, let F(0) = Y. • F(t+1) = SF(t) + (1-)Y • And therefore F(t) = (S)tY+ (1-)t-1i=0(S)iY. Semi-Supervised Learning

  18. Consistency Algorithm – Convergence Show the algorithm convergence to: F* = (1-)(I -S)-1Y F(t) = (S)tY+ (1-)t-1i=0(S)iY. Since: 0< <1 and the eigenvalues of S is in [-1, 1]: lim t→ (S)t-1 = 0 lim t→ i=0t-1 (S)i = (I -S)-1 Hence: F* = lim t→ F(t) = (1-)(I -S)-1Y Semi-Supervised Learning

  19. Regularization Framework • Define a cost function for the iteration stage: • The classifiying function is • smoothness constraint: a good classifying function should not change too much between nearby points. Semi-Supervised Learning

  20. Regularization Framework • fitting constraint: a good classifying function should not change too much from the initial label assignment. •  >0 : Trade off between constraints Semi-Supervised Learning

  21. Regularization Framework Semi-Supervised Learning

  22. Results Two Moon Toy Problem Semi-Supervised Learning

  23. Results Two Moon Toy Problem Semi-Supervised Learning

  24. Results Two Moon Toy Problem Semi-Supervised Learning

  25. Results Two Moon Toy Problem Semi-Supervised Learning

  26. Results Two Moon Toy Problem Semi-Supervised Learning

  27. Results Two Moon Toy Problem Semi-Supervised Learning

  28. Results: Digit Recognition • Run the algorithm over USPS database with digits 1, 2, 3, 4. • Class sizes are 1269, 929, 824, 852 (Total 3874). • The test errors are averaged over 30 trials. • The samples were chosen so that they contain at least one labeled point of each class. Semi-Supervised Learning

  29. Results: Digit Recognition Semi-Supervised Learning

  30. Results: Digit Recognition Semi-Supervised Learning

  31. Results: Digit Recognition Resultsaveraged over 100 trials Semi-Supervised Learning

  32. Results: Text classification • Use Mac & Windows subsets from 20 newsgroups data set. • There are 961 and 985 examples in the two classes with 7511 dimensions. Semi-Supervised Learning

  33. Results: Text Classification Semi-Supervised Learning

  34. Results: Text Classification 2 • Use the topic “rec” which contains autos, motorcycles, baseball and hockey subsets. • Preprocessing: • Remove ending from all words (like ing, ed,…) • Don’t pass words on the SMART list (the, of …) • Ignore the headers • Use only words that appear in 5 or more articles. • Data base size: 3970 document vectors in a 8014-dimensional space Semi-Supervised Learning

  35. Results: Text Classification 2 Semi-Supervised Learning

  36. References: • Learning with Local and Global Consistency: Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, Bernhard Scholkopf • http://www.kyb.mpg.de/publications/pdfs/pdf2333.pdf • Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions: Xiaojin Zhu, Zoubin Ghahramani, John Lafferty • http://www.hpl.hp.com/conferences/icml2003/papers/132.pdf Semi-Supervised Learning

  37. The End

More Related