1 / 18

An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments

An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments. Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical Informatics University of Pittsburgh School of Medicine Pittsburgh PA USA Presented by Thahir P. Mohamed.

dai
Télécharger la présentation

An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical Informatics University of Pittsburgh School of Medicine Pittsburgh PA USA Presented by Thahir P. Mohamed Advancing Practice, Instruction & Innovation through Informatics October 19-23, 2008

  2. 2 Protein Structure Primary Structure: Chain of amino acids Secondary Structure: Sub-structures such as helixes and strands Tertiary Structure:Atomic resolution of protein structure Protein structure is essential for successful design of drugs

  3. 3 Challenges in Protein Structure Prediction • X-ray crystallography, NMR spectroscopy are wet-lab methods to determine structure. • Very expensive • Very time consuming • Computational techniques are applied to predictprotein structure

  4. 4 Computational Protein Structure Prediction • Machine Learning techniques applied to predict structure • Experimentally determined structures are used to learnto predict new structures • When not enough data to learn from: • Active learning is applied to select the next protein to be studied experimentally

  5. 5 Active Learning Unlabeled Proteins Possible Labels:

  6. Active Learning Clustered Protiens Possible Labels: Cluster Unlabeled Proteins

  7. 7 Selection Algorithm Active Learning Clustered Proteins Possible Labels: Cluster Unlabeled Proteins

  8. 8 Selection Algorithm Active Learning Clustered Proteins Possible Labels: Cluster Unlabeled Proteins

  9. 9 Selection Algorithm Prediction Active Learning Labeled Protiens Possible Labels: Cluster Unlabeled Proteins Active learning guides selection of data points for which you ask for labels

  10. Membrane Protein Structure Prediction Membrane Protein importance and challenges 10 Membrane Proteins: • 30% of genes • cell regulation and signaling pathways • 60% of drug targets Yet, • Difficult to study experimentally • 1% of known protein structures Active learning can be used as a tool against the limited number of known MP structures despite the large number of known MP sequences

  11. ‘Features’ Representation 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 Residue: A L H W R A A G A A T V L L V I V E R G A P G A Q L I Topology: - - - - - M M M M M M M M M M M M - - - - - - - - - - Charge: - - p – p - - - - - - - - - - - - n p - - - - - - - - E-Prop: D d . . A D D . D D a d d d d d d D A . D D . D a d d Properties Charge Size Polarity Aromaticity Electronic Properties Data reduction is performed by SVD, resulting in a final 4 features per window.

  12. Dim 3 Dim 1 Dim 2 Clustering the Data • Neural Network Self Organizing Map (SOM) • Finds centroids of clusters in the data

  13. Design 1:Density-based Selection • Find the most dense cluster • Choose N points closest to its centroid • Find labels for these points (TM or NTM) • Find the majority label, say L • Assign L to all points in the cluster • Repeat for next dense cluster Clusters with no known structures are marked for study by experiments

  14. Design 1 Results Increase the number of data points for which we ask structure Compare how accuracy varies between guided selection (via active learning) versus random selection. A total of only 10 labels per node ~ 1% data

  15. Design 2:Protein – based Selection • Pick a random protein • Find labels for all windows in this protein • For each node containing labels, find the mode L of all labels it contains • Assign L to remaining data in node • Repeat and update for new protein, until half have been selected

  16. Percent Protein-based results Repeated for different permutations of protein selection order, and observed several metrics.

  17. Conclusions • We developed a framework that allows us to select a few proteins or fragments of proteins which, when annotated with experimental methods, may be used to label remaining protein sequences. • We have shown that it is possible to achieve higher accuracy values with guided selection of data compared to random selection of data.

  18. Acknowledgements Madhavi GanapathirajuJessica Wehner JW funded through NIH-NSF Bioengineering & Bioinformatics Summer Institute Visit us at Department of Biomedical Informatics University of Pittsburgh Thank you! www.dbmi.pitt.edu/madhavi  Cathedral of Learning, University of Pittsburgh

More Related