1 / 30

An introduction to self-taught learning

An introduction to self-taught learning. Presented by: Zenglin Xu 10-09-2007. [Raina et. al, 2007] Self-taught Learning: Transfer Learning from Unlabeled Data. Outline. Related learning paradigms A self-taught learning algorithm. Related learning paradigms. Semi-supervised learning

liam
Télécharger la présentation

An introduction to self-taught learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An introduction to self-taught learning Presented by: Zenglin Xu 10-09-2007 [Raina et. al, 2007] Self-taught Learning: Transfer Learning from Unlabeled Data

  2. Outline • Related learning paradigms • A self-taught learning algorithm

  3. Related learning paradigms • Semi-supervised learning • Transfer learning • Multi-task learning • Domain adaptation • Biased sample selection • Self-taught learning

  4. Semi-supervised learning • Except the training data (labeled), a large set of test data (unlabeled) are available • The training data and test data are drawn from the same distribution • Unlabeled data can be assigned with supervised learning task’s class labels • Reference • [Chapelle, et. al, 2006 ] Semi-supervised learning • [Zhu, 2005] Semi-supervised learning literature survey

  5. Transfer learning • Transfer Learning • The Theory of Transfer of Learning was introduced by Thorndike and Woodworth (1901). They explored how individuals would transfer learning in one context to another context that shared similar characteristics • Transfer of knowledge from one supervised task to other; Requires labeled data from a different but related task • E.g., transferring the knowledge from Newsgroup data to Reuters data • Related work in computer science • [Thrun & Mitchell, 1995] Learning one more thing • [Ando & Zhang, 2005] A framework for learning predictive structures from multiple tasks and unlabeled data

  6. Multi-task learning • It learns a problem together with other related problems at the same time, using a shared representation. • This often leads to a better model for the main task, because it allows the learner to use the commonality among the tasks. • Multi-task learning is a kind of inductive transfer. • It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better. • Reference, • [Caruana, 1997] Multitask Learning • [Ben-David & Schuller, 2003] Exploiting task relatedness for multiple task learning

  7. Domain adaptation • A term hot in language processing • Indeed, it can be called transfer learning • The supervised setting is usually like: • A large pool of out-of-domain labeled data • A small pool of in-domain labeled data • Reference • [Daume III, 2007] Frustratingly Easy Domain Adaptation • [Daume III & Marcu , 2006] Domain Adaptation for Statistical Classifiers • [Ben-David et. al, 2006 ] Analysis of Representations for Domain Adaptation

  8. Biased sample selection • Also called Covariance Shift • It deals with the case that the training data and test data are selected from different distributions in the same domain • The objective is to correct the bias • Reference • [Shimodaira, 2000] Improving predictive inference under covariate shift… • [Zadrozny, 2004] Learning and evaluating classifiers under sample selection bias • [Bickel et. al, 2007] Discriminative learning for differing training and test distributions

  9. Self-taught learning • Self-taught learning • Uses unlabeled data • Does not require unlabeled data to have same generative distribution • The unlabeled data can have different labels as those of the supervised learning task’s data. • Reference: • [Raina et. al] Self-taught learning: transfer learning from unlabeled data

  10. Outline • Related learning paradigms • A self-taught learning algorithm • Algorithm • Experiment

  11. Sparse coding – a self-taught learning algorithm • Learn high level feature representation using unlabeled data E. g. random unlabeled images usually contain basic visual patterns (like edges) that are similar to images (like that of elephant) which needs to be classified • Apply the representation to the labeled data and use it for classification

  12. Step 1 – learning higher level representations Given unlabeled data Optimize the following where are the basis vectors and are the activations

  13. Bases learned from image patches and speech data

  14. Step 2: apply the representation to the labeled data and use it for classification

  15. High-level features computed Using a set of 512 learned image bases (Fig 2 left), Figure 3 illustrates a solution to the previous optimization problem

  16. High-level features computed

  17. High-level features computed

  18. Connection to PCA

  19. Connection to PCA • PCA results in linear feature extraction, in that the features a(i)j are simply a linear function of the input. • The bases bj should be orthogonal, thus the number of PCA features cannot be greater than the dimension n of the input. Sparse coding does not have either of these limitations

  20. Outline • Related Learning paradigms • A self-taught learning algorithm • Algorithm • Experiment

  21. Experiment setting

  22. Experiment setting

  23. Experimental results on image

  24. Experimental results on characters

  25. Experimental results on music data

  26. Experimental results on text data

  27. Compare with results using features trained on labeled data Table 7. Accuracy on the self-taught learning tasks when sparse coding bases are learned on unlabeled data (third column), or when principal components/sparse coding bases are learned on the labeled training set (fourth/fth column).

  28. Discussion • Is it useful to learn a high-level feature representation in a unified process using both the labeled data and the unlabeled data? • How the similarity between the labeled data and the unlabeled data affect the performance? • And more?

More Related