1 / 19

Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan. Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI California State University at Los Angeles. Background.

lore
Télécharger la présentation

Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI California State University at Los Angeles The Chicken Project

  2. Background • Microarray data have the characteristics that the number of samples ismuch less than the number of variables. • This causes the “curse of dimensionality” problem. • In order to solve this problem, many dimension reduction methods are used such as Singular Value Decomposition and Partial Least Squares.

  3. Background (cont’d) • Singular Value Decomposition and Partial Least Squares. • Given a m x n matrix X that stores all of the gene expression data. Then X can be approximated as:

  4. Background (cont’d)

  5. Background (cont’d) • Logistic regression and least square regression. • They are ways to draw a line that can approximate a set of points.

  6. Background (cont’d) • The difference is that logistic regression equations are solved iteratively. A trial equation is fitted and tweaked over and over in order to improve the fit. Iterations stop when the improvement from one step to the next is suitably small. • Least square regression can be solved explicitly.

  7. Background (cont’d) • Penalized logistic regression is just a logistic regression method except that there is a cost function associated with it.

  8. Background (cont’d) • Support Vector Machine (SVM) • SVM tries a find a hyper-plane that can separate different sets of data. • Not a linear model.

  9. Hypothesis • The combination of dimension reduction-based penalized logistic regression has the best performance compared to support vector machine and least squares regression.

  10. Data Analysis The above table shows the number of training/testing cases in the seven publicly available cancer data sets.

  11. Data Analysis (cont’d)

  12. Data Analysis (cont’d)

  13. Data Analysis

  14. Data Analysis • Generally, the partial least square based classifier uses less time than the singular value decomposition based classifier.

  15. Data Analysis (cont’d) • The penalized logistic regression training requires solving a set of linear equations iteratively until convergence, while the least square regression training requires solving a set of linear equations only once. So it’s reasonable to see that penalized logistic regression uses more time than the least square regression.

  16. Data Analysis (cont’d) • The overall time required by partial least squares and SVD-based regression method is much less than that of support vector machine.

  17. Data Analysis

  18. Conclusion The combination of dimension reduction based penalized logistic regression has the best performance compared to support vector machine and least squares regression.

  19. References • [1] L. Shen and E.C. Tan (to appear in June, 2005) "Dimension Reduction-Based Penalized Logistic Regression for Cancer Classification Using Microarray Data", IEEE/ACM Trans. Computational Biology and Bioinformatics • [2] SoCalBSI: http://instructional1.calstatela.edu/jmomand2/ • [3] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning; Data mining, Inference and Prediction. Springer Verlag, New York, 2001.

More Related