1 / 43

SMURFLite: Enhancing Homology Detection for Beta-Structural Proteins Using Random Fields

Combining simplified Markov random fields with simulated evolution improves remote homology detection for beta-structural proteins. Learn the importance of homologous proteins, computational approaches to detecting homology, and how SMURFLite outperforms other methods in β-structural protein analysis.

chungs
Télécharger la présentation

SMURFLite: Enhancing Homology Detection for Beta-Structural Proteins Using Random Fields

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SMURFLite combining simplified Markov random fields with simulated evolution improves remote homology detection for beta-structural proteins into the twilight zone Noah M. Daniels | Raghavendra Hosur | Bonnie Berger | Lenore Cowen

  2. What are Homologous Proteins? Proteins that preserve related structure (and often function) because they have evolved from a common ancestor. Pig Insulin (Pdb Id: 1m5a) Human Insulin (Pdb Id: 1mso)

  3. Why is homology important? Common Ancestor Similar Structure Similar Function

  4. Computational Approaches to Detecting Homology Sequence based methods work best when homology is not too distant S1 LTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKV 60 L+P +K+ V A WGKV + E G EAL R+ + +P T+ +F F DLS G+ +V S2 LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-----HGSAQV 55 These proteins aligned by BLAST have probably evolved from common ancestor

  5. A Greater Challenge: Detect Remote Homologs

  6. Sequence data: how will we keep up?

  7. HMM is trained from Sequence Alignment of Known Structures

  8. Profile HMM

  9. HMMs cannot capture nonlocal interactions

  10. Markov random fields add nonlocality to HMMs

  11. Let’s look at what this would mean for propeller folds

  12. Structural Motifs Using Random Fields Can we get the benefit of pairwise correlations without having to throw away all sequence info?

  13. The template is learned from solved structures in the PDB

  14. The template is learned from solved structures in the PDB: Aligned with Matt

  15. The template is learned from solved structures in the PDB:Aligned with Matt

  16. Two beta tables are learned from amphipathic beta sheets that are not propellers from solved structures in the PDB. Two pairwise Exposed Residue Buried Residue http://bcb.cs.tufts.edu/propellers/si/

  17. Sequences are scored by computing their best “threading” or “parse” against the template as a sum of HMM(score) + pairwise(score) No longer polynomial time (multi-dimensional dynamic programming) Tractable on propellers because paired beta-strands don’t interleave too much See: Menke, Berger and Cowen, “Markov Random Fields Reveal an N-terminal double beta-propeller motif as part of a bacterial hybrid two-component sensor system”, PNAS March 2010. http://smurf.cs.tufts.edu Computing a Score

  18. What makes SMURF turn blue? :( Deeply interleaved β-strand pairs SMURF is exponential with the interleave number!

  19. 2 1 Interleave number is ß-strand complexity - - - -

  20. β propellers have a maximum interleave of 3

  21. β barrels range from an interleave of 4 to 8

  22. How do we make SMURF happy? Simplify the dependency graph Only consider beta-strands up to an interleave threshold of i. i=0; ordinary HMM i=1; fast i=2; still fast i=3; still tractable i=4; getting too slow

  23. SMURFLite ignores highly interleaved β-strand pairs

  24. SMURFLite ignores highly interleaved β-strand pairs Can we somehow weakly capture the pairwise information discarded?

  25. Simulated Evolution (Kumar and Cowen, 2010) • An HMM is “only as good as the training data” – but is it? • Leverage our knowledge of evolution to construct new, artificial training data • Kumar and Cowen, Bioinformatics 2009 and 2010.

  26. β-strand Mutation Model

  27. SE pipeline

  28. SE pipeline We showed that this improved performance for HMMs; How about for our new MRFs?

  29. Pairwise evolution model Exposed Residue Buried Residue http://bcb.cs.tufts.edu/propellers/si/

  30. SMURFLite: simplified, augmented MRF Identify beta-strand pairs Count their interleave number Augment the training profile with simulated evolution on beta-strand paired residues Exclude beta-strand pairs that are too interleaved from the MRF

  31. The SMURFLite Pipeline

  32. Results: The Dataset

  33. 5-bladed, 6-bladed, 7-bladed, and 8-bladed propeller folds. All 11 SCOP superfamilies in the mainly-beta Class that contain the word “barrel” in the description (doesn’t include 2 not structurally consistent). Results: The Dataset

  34. SMURFLite compared to HMMer, Raptorand HHpred

  35. SMURFLite handles β barrels and sandwiches Translation proteins

  36. SMURFLite compared to other programs

  37. Interleaving still matters! Barwin-like endoglucanases

  38. All this lets us search whole genomes Thermotogamaritima 1852 genes 207 ß-structural templates We find 139 “hits” 28 have solved structures in PDB 8 predictions agree with Zhang et al.; None contradicted Credit: K.O. Stetter & R. Rachel, Univ.Regensburg Gene Q9X087 (“putative uncharacterized protein”) only 20% identity with its closest solved BLAST hit (Rhoptry protein from Plasmodium yoelli yoelli). We predict it belongs to “beta-Galactosidase/gluconuridase domain” with p-value of 0.0006

  39. In the end... MRFs outperform the competition on ß structures They get too complicated on some structures We can “fix” this by snipping out the hard parts But we don’t want to lose all that information, so we use Simulated Evolution to partially preserve it This lets us do whole-genome searches quickly We found some possible annotations in Thermotoga!

  40. Where do we go from here? Dynamic programming is too slow Stochastic search for beta strand positions In between, standard dynamic programming to solve HMM

  41. SMURFLite Thanks to Matt Menke, Anoop Kumar, and Jinbo Xu. This work was funded in part by NIH grant 1R01GM080330 (to Lenore Cowen) and 1R01GM081871 (to Bonnie Berger). bcb.cs.tufts.edu

More Related