Classification / Prediction

Classification / Prediction

“Supervised Learning” Use a “training set” of examples to create a model that is able to predict, given an unknown sample, which of two or more classes that sample belongs to. Classification ?

How to build a classifier. How to evaluate a classifier. Using GenePattern to classify expression data. What we’ll cover

A predictive rule that uses a set of inputs(genes) to predict the values of the output(phenotype). Known examples (train data) are used to build the predictive rule. Goal: Achieve high generalization (predictive) power. Avoid over-fitting. What Is a Classifier

Classification Computational methodology Known Classes Data Assess Gene-Class Correlation Feature (Marker) Selection Build Classifier Test Classifier by Cross-Validation Evaluate Classifier on Independent Test Set

Regression Trees (CART) Weighted Voting k-Nearest Neighbors Support Vector Machines Classification Computational methodology Known Classes Expression Data Assess Gene-Class Correlation Feature Selection Build Classifier Test Classifier by Cross-Validation Evaluate Classifier on Independent Test Set

Important issues: Few cases, many variables (genes) redundancy: many highly correlated genes. noise: measurements are very imprecise. feature selection: reducing p is a necessity. Avoid over-fitting. Classifiers

project samples in gene space classblack gene 2 classorange gene 1 K-nn Classifier Example: K=5, 2 genes, 2 classes

? K-nn Classifier Example: K=5, 2 genes, 2 classes project unknown sample classblack gene 2 classorange gene 1

Distance measures: • Euclidean distance • 1-Pearson correlation • … ? K-nn Classifier Example: K=5, 2 genes, 2 classes "consult" 5 closest neighbors: -3 black - 2 orange classblack gene 2 classorange gene 1

Support Vector Machine (SVM) Noble, Nat Biotech 2006

Mixture of Experts approach: Each gene casts a vote for one of the possible classes. The vote is weighted by a score assessing the reliability of the expert (in this case, the gene). The class receiving the highest vote will be the predicted class. The vote can be used as a proxy for the probability of class membership (prediction strength). new sample Class 1centroid Class 2centroid g1 g2 … gi … gn-1 gn g1 g2 … gi … gn-1 gn ? Weighted Voting Slonim et al., RECOMB 2000

Regression Trees (CART) Weighted Voting k-Nearest Neighbors Support Vector Machines Class Prediction Computational methodology Expression Data Known Classes Assess Gene-Class Correlation Feature Selection Build Predictor Test Predictor by Cross-Validation Evaluate Predictor on Independent Test Set

Evaluation on independent test set Build the classifier on the train set. Assess prediction performance on test set. Maximize generalization/Avoid overfitting. Performance measure error rate = # of cases correctly classified total # of cases Testing the Classifier

Evaluation on independent test set What if we don’t have an independent test set? Cross Validation (XV): Split the dataset into n folds (e.g., 10 folds of 10 cases each). For each fold (e.g., for each group of 10 cases), train (i.e., build model) on n-1 folds (e.g., on 90 cases), test (i.e., predict) on left-out fold (e.g., on remaining 10 cases). Combine test results. Frequently, leave-one-out XV (when small sample size). Testing the Classifier

Testing the Classifier Learning curves – leave one out cross validation ALL vs. MLL vs. AML

Error rate estimate: Evaluate on independent test set: Best error estimate. Cross Validation Needed when small sample size and for model selection. Testing the Classifier

Start by splitting data into train and test set (stratified). “Forget” about the test set until the very end. Explore different feature selection methods and different classifiers on train set by XV. Once the “best” classifier and best classifier parameters have been selected (based on XV) Build a classifier with given parameters on entire train set. Apply classifier to test set. Classification Cookbook

Split data into train and test set – SplitDatasetTrainTest Explore different feature selection methods and different classifiers on train set by CV. CARTXValidation KNNXValidation WeightedVotingXValidation Once the “best” classifier and best classifier parameters have been selected (based on CV) CART KNN WeightedVoting Examine results: PredictionReultsViewer FeatureSummaryViewer Classification GenePattern methods

Classification Example

Classification / Prediction

Classification / Prediction

Presentation Transcript

powerpoint presentation

Powerpoint presentation

PPT Presentation

PowerPoint presentation

PowerPoint Presentation.

talk-ppt - PowerPoint Presentation

Classification and Prediction

Classification and Prediction

Classification Presentation

Prediction (Classification, Regression)

PowerPoint Presentation

Classification and Prediction

Classification and Prediction

Classification / Prediction

PowerPoint Presentation

PowerPoint Presentation

Classification and Prediction

Classification and Prediction

Classification and Prediction

Full Service Moving Plano TX - PowerPoint PPT Presentation

IEinfosoft.Pvt.Ltd Powerpoint PPT Presentation.

1800 Drivers PPT - PowerPoint PPT Presentation