290 likes | 622 Vues
Outline. What do we mean with classification, why is it usefulMachine learning- basic conceptSupport Vector Machines (SVM)Linear SVM
 
                
                E N D
1. An Introduction to Support Vector Machine Classification 
2. Outline What do we mean with classification, why is it useful
Machine learning- basic concept
Support Vector Machines (SVM)
Linear SVM  basic terminology and some formulas
Non-linear SVM  the Kernel trick
An example: Predicting protein subcellular location with SVM
Performance measurments 
3. Classification Everyday, all the time we classify things.
 Eg crossing the street: 
Is there a car coming?
At what speed?
How far is it to the other side?
Classification: Safe to walk or not!!! 
5. Classification tasks in Bioinformatics  
6. Problems in classifying biological data Often high dimension of data.
Hard to put up simple rules.
Amount of data.
Need automated ways to deal with the data.
Use computers  data processing, statistical analysis, try to learn patterns from the data (Machine Learning) 
8. Black box view ofMachine Learning 
9. Tennis example 2 
10. Linear Support Vector Machines  
11. Linear SVM 2 
12. Definitions 
13. Maximizing the margin 
14. The Lagrangian trick 
15. Problems with linear SVM 
16. Non-linear SVM 1 
17. Non-linear svm2 
18. Solving the optimization problem In many cases any general purpose optimization package that solves linearly constrained equations will do. 
Newtons method
Conjugate gradient descent
Other methods involves nonlinear programming techniques. 
19. Overtraining/overfitting 
20. Overtraining/overfitting 2 Example with a gardener.Example with a gardener. 
21. A practical example, protein localization Proteins are synthesized in the cytosol.
Transported into different subcellular locations where they carry out their functions.
Aim: To predict in what location a certain protein will end up!!! 
22. Subcellular Locations 
23. Method Hypothesis: The amino acid composition of proteins from different compartments should differ.
Extract proteins with know subcellular location from SWISSPROT.
Calculate the amino acid composition of the proteins.
Try to differentiate between: cytosol, extracellular, mitochondria and nuclear by using SVM 
24. Input encoding 
25. Cross-validation 
26. Performance measurments 
27. Results We definetely get some predictive power out of our models.
Seems to be a difference in composition of proteins from different subcellular locations.
Another questions: What about nuclear proteins. Is there a difference between DNA-binding proteins and others??? 
28. Conclusions We have (hopefully) learned some basic concepts and terminology of SVM.
We know about the risk of overtraining and how to put a measure on the risk of bad generalization.
SVMs can be useful for example in predicting subcellular location of proteins. 
29. You cant input anything into a learning machine!!! 
30. References