1 / 15

Diagnosis of multiple cancer types by shrunken centroids of gene expression

Diagnosis of multiple cancer types by shrunken centroids of gene expression. By Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu. Course: 550.635 Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor Geman. Nearest Centroid Classification.

Télécharger la présentation

Diagnosis of multiple cancer types by shrunken centroids of gene expression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Diagnosis of multiple cancer types by shrunken centroids of gene expression By Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu Course: 550.635 Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor Geman

  2. Nearest Centroid Classification Example: small round blue cell tumors of childhood • 63 training samples, 25 testing samples • 4 classes: BL, EWS, NB, RMS • Figure 1 • Nearest centroid classification • Disadvantage

  3. Nearest shrunken Centroids • A modification of the nearest centroid method • Idea: First normalize class centroids by the within-class standard deviation for each gene, shrink each class centroid towards the overall centroid.

  4. Details: Mean expression value in class k for gene i ith component of the overall centroid Pooled within class standard deviation for gene i

  5. It measures the difference between the gene i in class k and gene i in all classes combined. • Idea: a gene that discriminates one class from the rest will have a statistic of large absolute value.

  6. Shrink it toward zero to eliminate the genes that do not provide sufficient information. • ‘De-noising’ step

  7. Choosing the amount of shrinkage • Shrinkage amount is allowed to vary over a wide range. • 10-fold cross validation ( choose the one that has the smallest error rate) • Divide the set of samples (at random)into 10 equal size parts. (classes were distributed proportionally among each of the 10 parts) • Fit the model on 90% of the samples and then predict the class label of the remaining 10% (test samples). • Repeat 10 times, add together the error (overall error). • Figure 2 • Figure 1

  8. More Figures • Figure 3 • Figure 4

  9. Classification • A new sample is classified by comparing its expression profile with each shrunken centroid, over those 43 active genes. • Distance function: prior information included.

  10. Statistical details: • t-statistic • Estimates of the class probabilities (Figure 5)

More Related