90 likes | 103 Vues
SMA5422: Special Topics in Biotechnology. Lecture 8: Machine learning techniques in sequence analysis Introduction Methods Application examples for sequence analysis. Introduction to Machine Learning. Goal: To “improve” (gaining knowledge, enhancing computing capability) Tasks:
E N D
SMA5422: Special Topics in Biotechnology Lecture 8: Machine learning techniques in sequence analysis • Introduction • Methods • Application examples for sequence analysis
Introduction to Machine Learning • Goal: • To “improve” (gaining knowledge, enhancing computing capability) • Tasks: • Forming concepts by data generalization. • Compiling knowledge into compact form • Finding useful explanations for valid concepts. • Clustering data into classes. • Reference: • Machine Learning in Molecular Biology Sequence Analysis. • Internet links: • http://www.ai.univie.ac.at/oefai/ml/ml-resources.html
Introduction to Machine Learning • Category: • Inductive learning. • Forming concepts from data without a lot of knowledge from domain (learning from examples). • Analytic learning. • Use of existing knowledge to derive new useful concepts (explanation based learning). • Connectionist learning. • Use of artificial neural networks in searching for or representing of concepts. • Genetic algorithms. • To search for the most effective concept by means of Darwin’s “survival of the fittest” approach.
Machine Learning Methods Inductive learning: Concept learning and example-based learning Concept learning:
Machine Learning Methods Analytic learning:
Machine Learning Methods Neural network:
Machine Learning Methods Genetic algorithms:
Machine Learning in Sequence Analysis • Example: • Protein secondary structure prediction: • Procedure: • Amino acids classified according to chemical property • Each amino acid in a sequence is represented by a set of descriptors • Rules are generated based on positive and negative examples. • The learned rules: • Descriptors 1, Descriptors 2, …, Descriptors n -> Secondary Structure Type • Tiny or Polar, Large, Aromatic or M, Large and Non-negative -> Helix • Accuracy achieved: 60% • Progress in Machine Learning: Proc. 2nd European • Working Session in Learning. Page 230-250.
Homework • Read the references about machine learning given in the lecture. • Read at least one of the following references about SVM in biology: Bioinformatics 16, 906-914 (2000); 17, 721-728 (2001); 17, 349-358 (2001)