1 / 20

Classification / Prediction

Classification / Prediction. “Supervised Learning” Use a “training set” of examples to create a model that is able to predict, given an unknown sample, which of two or more classes that sample belongs to. Classification. ?. How to build a classifier. How to evaluate a classifier.

schristine
Télécharger la présentation

Classification / Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification / Prediction

  2. “Supervised Learning” Use a “training set” of examples to create a model that is able to predict, given an unknown sample, which of two or more classes that sample belongs to. Classification ?

  3. How to build a classifier. How to evaluate a classifier. Using GenePattern to classify expression data. What we’ll cover

  4. A predictive rule that uses a set of inputs(genes) to predict the values of the output(phenotype). Known examples (train data) are used to build the predictive rule. Goal: Achieve high generalization (predictive) power. Avoid over-fitting. What Is a Classifier

  5. Classification Computational methodology Known Classes Data Assess Gene-Class Correlation Feature (Marker) Selection Build Classifier Test Classifier by Cross-Validation Evaluate Classifier on Independent Test Set

  6. Regression Trees (CART) Weighted Voting k-Nearest Neighbors Support Vector Machines Classification Computational methodology Known Classes Expression Data Assess Gene-Class Correlation Feature Selection Build Classifier Test Classifier by Cross-Validation Evaluate Classifier on Independent Test Set

  7. Important issues: Few cases, many variables (genes) redundancy: many highly correlated genes. noise: measurements are very imprecise. feature selection: reducing p is a necessity. Avoid over-fitting. Classifiers

  8. project samples in gene space classblack gene 2 classorange gene 1 K-nn Classifier Example: K=5, 2 genes, 2 classes

  9. ? K-nn Classifier Example: K=5, 2 genes, 2 classes project unknown sample classblack gene 2 classorange gene 1

  10. Distance measures: • Euclidean distance • 1-Pearson correlation • … ? K-nn Classifier Example: K=5, 2 genes, 2 classes "consult" 5 closest neighbors: -3 black - 2 orange classblack gene 2 classorange gene 1

  11. Support Vector Machine (SVM) Noble, Nat Biotech 2006

  12. Mixture of Experts approach: Each gene casts a vote for one of the possible classes. The vote is weighted by a score assessing the reliability of the expert (in this case, the gene). The class receiving the highest vote will be the predicted class. The vote can be used as a proxy for the probability of class membership (prediction strength). new sample Class 1centroid Class 2centroid g1 g2 … gi … gn-1 gn g1 g2 … gi … gn-1 gn ? Weighted Voting Slonim et al., RECOMB 2000

  13. Regression Trees (CART) Weighted Voting k-Nearest Neighbors Support Vector Machines Class Prediction Computational methodology Expression Data Known Classes Assess Gene-Class Correlation Feature Selection Build Predictor Test Predictor by Cross-Validation Evaluate Predictor on Independent Test Set

  14. Evaluation on independent test set Build the classifier on the train set. Assess prediction performance on test set. Maximize generalization/Avoid overfitting. Performance measure error rate = # of cases correctly classified total # of cases Testing the Classifier

  15. Evaluation on independent test set What if we don’t have an independent test set? Cross Validation (XV): Split the dataset into n folds (e.g., 10 folds of 10 cases each). For each fold (e.g., for each group of 10 cases), train (i.e., build model) on n-1 folds (e.g., on 90 cases), test (i.e., predict) on left-out fold (e.g., on remaining 10 cases). Combine test results. Frequently, leave-one-out XV (when small sample size). Testing the Classifier

  16. Testing the Classifier Learning curves – leave one out cross validation ALL vs. MLL vs. AML

  17. Error rate estimate: Evaluate on independent test set: Best error estimate. Cross Validation Needed when small sample size and for model selection. Testing the Classifier

  18. Start by splitting data into train and test set (stratified). “Forget” about the test set until the very end. Explore different feature selection methods and different classifiers on train set by XV. Once the “best” classifier and best classifier parameters have been selected (based on XV) Build a classifier with given parameters on entire train set. Apply classifier to test set. Classification Cookbook

  19. Split data into train and test set – SplitDatasetTrainTest Explore different feature selection methods and different classifiers on train set by CV. CARTXValidation KNNXValidation WeightedVotingXValidation Once the “best” classifier and best classifier parameters have been selected (based on CV) CART KNN WeightedVoting Examine results: PredictionReultsViewer FeatureSummaryViewer Classification GenePattern methods

  20. Classification Example

More Related