PKNN/LSVM Approach for Gene Expression Analysis (P-trees Technology)

PKNN/LSVM Approach for Mircroarray Gene Expression Analysis (P-trees technology is patented by NDSU)

OUTLINE • Introduction • P-tree Technology • Our Approach • PODIUM KNN • Weight Optimization • Improving Accuracy by LSVM • Performance Study • Conclusion

PODIUM KNN • Dissimilarity measurement: F(X,Y)=wid(xi,yi) where d(xi,yi)= |xi-yi|, manhattan distance Stage1. finding neighborsStage2. Podium votes

Optimizing Weights • Genetic algorithm, as introduced by Goldberg (1989), is randomized search and optimization techniques that is capable of searching for optimal solutions. • Step1. Partition weight space • Step2. Evaluation/Selection: 10-fold cross validation 1010 1110 … 1010 1010 1110 … 1010 eval 1010 1110 … 1010 1010 1110 … 1010 …

1010 1110 … 1010 rep 1010 1110 … 1010 1010 1110 … 1010 mut 1010 1110 … 1010 1010 1110 … 1010 Optimizing Weights (cont.) • Step3. Reproduction • Step4. Mutation • Step5. Go back to step2 till reaching stop conditions.

Class 1 Optimal boundary Optimal margin Class 2 Optimal Knn/LSVM • Why LSVM: A lesson from KddCup02

Optimal Knn/LSVM (cont.) • EIN-ring membership • C: component • R: radius • Support vector pair • Boundary Sentry • Boundary hyper plane + + + + + + + + + + + + + - + - * - + + - - - # * - - - - - - - - - - - Step1. finding support vector pairsStep2. fitting boundary hyper plane

Class 1 Optimal boundary Optimal margin Class 2 Optimal Knn/LSVM (cont.) • Robust for Data Set with Noise

data DCI Model GA Model Basic P-trees w1,w2,…,wd Cuboids Model gw1,…,(wi,wj),…,gwk Sorting w.t. avg(gw) HOBBit/EINring EINring Formulation Excution using PDM PDM Model Implementation • Models Structure Design

Data Sets of Bioinformatics • DS1. Leukemia data, size 6817x72, (http://llmpp.nih.gov/lymphoma/) • DS2. Colon cancer data (Alon 1999),size 2000x62 • DS3. NCI60, size 1376x60 • DS4. Yeast sporulation data set (Chu et al. 1998). Time series data. http://cmgm.stanford.edu/pbrown/sporulation/.

Performance Study • Accuracy Comparision

Performance Study (cont.) • Influence of noise

Performance Study (cont.) • Influence of GA parameters

Conclusion

PKNN/LSVM Approach for Gene Expression Analysis (P-trees Technology)

PKNN/LSVM Approach for Gene Expression Analysis (P-trees Technology)

Presentation Transcript

Outline

Outline

Outline

Outline

Outline

Outline

Outline

outline

outline

OUTLINE

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline:

Outline

Outline

OUTLINE: