Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
PREDICTING PROTEIN STRUCTURE AND BEYOND …. PowerPoint Presentation
Download Presentation
PREDICTING PROTEIN STRUCTURE AND BEYOND ….

PREDICTING PROTEIN STRUCTURE AND BEYOND ….

179 Views Download Presentation
Download Presentation

PREDICTING PROTEIN STRUCTURE AND BEYOND ….

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. PREDICTING PROTEIN STRUCTURE AND BEYOND …. P. V. Balaji Biotechnology Center I.I.T., Bombay

  2. Organization of the talk 1. Why predict the structure? 2. Methods for structure prediction 3. What next?

  3. Genome Size is not Proportional to the Complexity of the Organism Complexity Size of the Genome

  4. English • 26-Letter alphabet • Only one grammar • Extremely diverse literature Molecular Logic of Life is Same Genome • 4-Letter alphabet • Only one grammar • Extremely diverse organisms Biochemically, all things living – animals, plants, bacteria, viruses, etc. – are remarkably similar

  5. Genome Sequencing and Analysis: One of the Key Steps in Deciphering the Logic of Life Even minute details have to be analyzed Hang him, not let him go Hang him not, let him go Humans: NeuNAc Chimpanzees: NeuNGc –CH3 –CH2OH

  6. Innovations in Technology Have Made Genome Sequencing a Routine Affair Genome sequencing Completed: ~70 organisms In the pipeline: Several more “ … it is unlikely that the base sequence of more than a few percent of such a complex DNA will ever be determined …” C W Schmid & W R Jelinek, Science, June 1982

  7. One Aspect of Genome Sequence Analysis is to Assign Functions to Proteins (Reverse Genetics) Proteins are workhorses of the cell Are involved in every aspect of living systems

  8. Function of a Protein can be Defined at Different Levels Example: Lysozyme Biochemical level: Hydrolyzes C—O bond Physiological level: Breaks down the cell wall Cellular level: Defense against infection Different Analysis Tools Provide Functions at Different Levels

  9. Hallmark of Proteins: Specificity Know exactly which small molecule (ligand) they should bind to or interact with Also know which part of a macromolecule they should bind to

  10. Origin of Specificity Function is critically dependent on structure 1ruv.pdb

  11. Structure – Key to Dissect Function Location of Mutants Conserved Residues SNPs Clefts (active sites) Dynamics (breathing) Surface Shape & Charge Antigenic Sites, surface patches Structure Crystal Packing Functional Oligomerization Relative Juxtaposition Fold Interaction Interfaces Catalytic Clusters Motifs Catalytic Mechanism Evolutionary Relationships

  12. Sequence Determines Structure 1KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHES LADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTT QANKHIIVACEGNPYVPVHFDASV124 1ruv.pdb Christian B. Anfinsen: Nobel Prize in Chemistry (1972)

  13. Sequence Functional Genomics Function Structure How Does Sequence Specify Structure? ? The Protein Folding Problem (second half of the genetic code) Structure has to be determined experimentally

  14. X-ray crystallography Provides a static picture Solubilization of the over-expressed protein Obtaining crystals that diffract Nuclear Magnetic Resonance spectroscopy Provides a Dynamic picture Size-limit is a major factor Experimental Methods of Structure Determination Solubilization of the over-expressed protein

  15. Limitations of Experimental Methods: Consequences Annotated proteins in the databank: ~ 100,000 Total number including ORFs: ~ 700,000 Proteins with known structure: ~5,000 ! Dataset for analysis ORF, or Open Reading Frame, is a region of genome that codes for a protein Have been identified by whole genome sequencing efforts ORFs with no known function are termed orphan

  16. Structural Biology Consortia: Brute Force Approach Towards Structure Elucidation * Aim to solve about 400 structures a year Employ battalions of Ph.Ds & Post-doctorals Large-scale expression & crystallization attempts – Basic strategies remain the same No (known) new tricks “Unrelenting” ones will be ignored + Enhances the statistical base for inferring sequence – structure relationships

  17. Predicting Protein Structure: 1. Comparative Modeling (formerly, homology modeling) Homologous KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSRNICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRL Share Similar Sequence 1alc ? Use as template & model 8lyz

  18. Comparative Modeling Basis Limited applicability * A large number of proteins and ORFs have no similarity to proteins with known structure * Structure is much more conserved than sequence during evolution * Higher the similarity, higher is the confidence in the modeled structure

  19. Predicting Protein Structure: Alternative Methods * Both these methods depend heavily on the analysis of known protein structures Threading or Fold Recognition Ab initio * In addition, establishing sequence  structure relationship is also important * Input from people trained in statistics, pattern recognition and related areas of computer science is very critical

  20. Statistical Analysis of Protein Structures: Microenvironment Characterization Describe structures at multiple levels of detail using a comprehensive set of properties Atom based properties Type, Hydrophobicity, Charge Residue based properties Type, Hydrophobicity Chemical group Hydroxyl, Amide, Carbonyl, etc. Secondary structure a-Helix, b-Strand, Turn, Loop Other properties VDW volume, B-factor, Mobility, Solvent accessibility

  21. Predicting Protein Structure: 2. Threading or Fold Recognition Basis * * It is estimated there are only around 1000 to 10 000 stable folds in nature Irrespective of the amino acid sequence, a protein has to adopt one of these folds * * Select the best sequence-fold alignment using a fitness scoring function NP-complete problem * Fold recognition is essentially finding the best fit of a sequence to a set of candidate folds

  22. Fold of a Protein Refers to the spatial arrangement of its secondary structural elements (a-helices and b-strands) 1l45.pdb 4bcl.pdb 1mbl.pdb a/b-barrel b-barrel a/b-sandwich

  23. Threading: Basic Strategy Library of folds Scoring & selection Spatial Interactions Template Sequence dhgakdflsdfjaslfkjsdlfjsdfjasd Query

  24. Predicting Protein Structure: 3. Ab Initio Methods Tertiary structure Sequence Prediction Secondary structure Low energy structures Predicted structure Validation Mean field potentials Energy Minimization

  25. Small molecules and/or metal ions are an integral part of certain proteins 1a6g.pdb Predicting the structure of such proteins is an entirely different challenge

  26. Proof of the Pudding: CASP Meetings Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction – 4 Predictions; not Post-dictions Easy and medium targets: ~100% success Hard targets: ~50% success Significant increase from CASP3

  27. OK, I can predict the structure correctly! is that it? Well, no!! Detailed biochemical characterization is required Strict structure – function correlation exists only for a subset of proteins Some folds (ferredoxin, TIM barrel, …) are very popular – several protein families, with diverse functions, adopt these folds Despite high similarity in sequence and structure, may act on different substrates (hence different functions) – due to subtle changes in active site (b13-GalT and b13-GlcNAcT)

  28. Similar structure, mutually exclusive function: Lysozyme & a-lactalbumin Same function, completely different structures: Carbonic anhydrases from M. thermophila and mouse “Moonlighting” proteins – one structure(?), multiple functions 8lyz.pdb, 1alc.pdb Gal1p – Kinase as well as regulator of Gal-gene expression Gal3p – 70% similar; does not have kinase activity 1thj.pdb 1dmx.pdb Inferring Function from Structure: Caveats Glyceraldehyde 3-phosphate dehydrogenase Glycolysis Binding protein for plasmin, fibronectin and lysozyme Transcriptional control of gene expression, DNA replication and repair Flocculation

  29. Same fold, different oligomerization Dimerization Tetramerization ConA ConA PNA PNA, GSIV

  30. Ligand Induced Conformational Changes are Quite Common Binding of first substrate redefines the active site and creates the binding pocket for the second substrate and the metal ion Flexible loop After Before

  31. Predicting Protein Structure is a key component of genome sequence analysis Structure is a very important link in deciphering the function New tools are required? Or larger training dataset is required? Take Home Message

  32. Organizers for giving me this opportunity Sujatha and Jayadeva Bhat for helping me put together this talk Acknowledgement Few Useful Links http://guitar.rockefeller.edu/modeller/modeller.html http://www.biochem.ucl.ac.uk/bsm/cath-new/index.html http://predictioncenter.llnl.gov/ http://insulin.brunel.ac.uk