1 / 7

Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation

Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation. Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang. Responsibilities.

leon
Télécharger la présentation

Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang

  2. Responsibilities • Lynn Lee studied and described the classification methods, and performed all the experiments that use KNN as the classification method. • Hiram Shaish studied and described the background of microarrays, and compiled and analyzed the experimental results. • Eric Smith programmed, tested, and described the data parser. • Min Zhang studied and described the feature selection methods, and performed all the experiments that use SVM as the classification method. • Each team member contributed to the writing and editing process.

  3. The Parser • Written in Perl • 100 lines of code, plus 90 lines of comments and blank lines • 2 phases: • Parse SOFT headers to generate some ARFF headers • Parse SOFT matrix, generating the rest of the ARFF headers and the ARFF matrix

  4. The Data • 75 samples • 22215 genes • 3 classes: smokers, non-smokers, those who quit smoking • Easy phenotype to verify • Caveats?

  5. Feature Selection • Info Gain • Chi Square • 1, 2, 5, 10, 20, 50, 100, 200, 300, 400, 500 features selected • Results: almost identical features selected for both algorithms • Reflects ‘partitionability’ of data set

  6. Classification • ECOC • KNN • Paired the 2 classification algorithms with 2 feature selection algorithms • Results: -KNN ‘out-classifies’ ECOC with less features (70% with 1) -Highest accuracy as a function of feature selection algorithm

  7. Classification • Accuracy does not increase beyond a maximum potential, regardless of feature # • Suggests an inherent characteristic of the data

More Related