60 likes | 166 Vues
Discover how Genetic Algorithms identify protein features for enzyme function prediction. Explore Enzyme Commission (EC) numbers, classification, and nomenclature. Learn about the EC Wheel Figure and the balance between local and global information.
E N D
Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function Andrew Kernytsky, Burkhard Rost Columbia University
Enzyme function prediction Given protein sequence predict Enzyme Commission (EC) number Ligases Isomerases Oxidoreductases Lyases Transferases Hydrolases NC-IUBMB (1992) Recommendations of the International Union of Biochemistry on the Nomenclature and Classification of Enzymes. In, Enzyme Nomenclature. Academic Press, New York. EC Wheel Figure: Porter CT, Bartlett GJ, Thornton JM. Nucleic Acids Res. 2004 January 1; 32: D129–D133.
Limited local information All Global 1% All Intersection 0.1% TAGHCVNYDYGAGCQSGSPV bbbbbieeeiibbieeeeee ..|....|......||.... HHHEEEEELLEEEEELLLLL iiibbbbbbboooobbbbbb 36788842100000000123 AA Acc Cons Feat 4 Feat 5 Feat 6 0.01% Significant risk of overfitting during training 103+ features > 102 positive samples Intersection properties capture local information 20% 10% 5%
All intersection and global feature classes All possible combinations of feature classes[genomes] Protein sequence 2nd Generation Genome Pop. Inner Learning Algorithm 3rd Generation Genome Pop. Fitness Assesed M S N L L K D F E V A Q C AA×sec AA AA×sec sec AA×sec AA AA×sec sec AA×sec 0.635 0.688 0.677 AA sec AA×sec AA sec AA AA×sec sec AA×sec AA sec AA×sec AA sec AA×sec GA Evolution Neural Network Selection Crossover Mutation OR SVM 1st 2nd 3rd 4th Generation Populations Algorithm overview Genetic Algorithm
GA improves performance EC Level
Balance between intersection and global features gives best performance