1 / 33

or Is biologist logical, a nd computer scientist alive ?

Wrong assumptions and misinterpretations in explanations of biological models, phenomena and processes Jacek Leluk ICM UW. or Is biologist logical, a nd computer scientist alive ?.

verna
Télécharger la présentation

or Is biologist logical, a nd computer scientist alive ?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wrong assumptions and misinterpretations in explanations of biological models, phenomena and processes Jacek Leluk ICM UW or Is biologist logical, and computer scientist alive?

  2. How is it, that your genome is in 98% the same as genome ofchimpanzee and only in 50% as your own father’s genome? "O składności członówczłowieczych" Dlaczego ptacy mleka nie dają? Bo musiałyby mieć cyce, które by im wadziły ku lataniu. Andrzej z Kobylina (XVI w.)

  3. Is biology „bilogical”? Nomenclature chaos: • Mitochondria or chondriosomes? • Is papain a proteolytic enzyme? • definition of identity, similarity an homology Misinterpretaion: • Amino acid sequence of gene? • Why squash inhibitors are inhibitors? • Is wheat aglutinin to aglutinate rabbit red cells? Incomplete knowledge • Stochastic index matrices • Statistical description of biological processes

  4. The problem of terminology • BPTI - Basic Pancreatic Trypsin Inhibitor - Bovine Pancreatic Trypsin Inhibitor - Basic Protein Trypsin Inhibitor • PAM- Point Accepted Mutations- Percent Accepted Mutations • Kunitz trypsin inhibitor- BPTI - mammalian organs- STI - soybean trypsin inhibitor

  5. What may everybody do wrong? Monte Carlo approach in structure analysis and prediction - – what state do we predict? Mathematical modelling of life processes – - Markov chains and protein evolution and differentiation- significance similarity estimation

  6. What may biologists do wrong? Amino acids and proteins – - do proteins consist of amino acids as we describe? Definitions and theory –- definition of species and theory of evolution- definitions and biology Correlated mutations –- dispersed correlation

  7. What may theoreticians do wrong? Primitive or ancestral? –- (Cyanophyta, Archaebacteria, ape and human) Global and local energy minima –- can we predict the exact conformation at exact time? Microscopic/mesoscopic/macroscopic processes - - water molecule and tsunami Assumptions and conclusions –- incomplete assumptions and wrong conclusions- deformations by simplifying- is the protein sequence just a string of characters?

  8. Sequence identity estimation in proteomics and genomics Identity threshold – does it make sense?

  9. WHAT IS IMPORTANT IN THE PROTEIN SIMILARITY SEARCH ? 1) Contribution (%) of identical positions 2) Length of the compared strings (sequences) 3) Distribution of the identical positions along the analyzed sequence

  10. WHAT IS IMPORTANT IN THE PROTEIN SIMILARITY SEARCH ? 4) Residues at the conservative positions 5) Structural/genetic similarity of the amino acids at non-conservative positions

  11. Sequence multiple alignment Problem of gap manipulation Any protein can be aligned with each other as homologous/similar anybiologicalstring anybilogicalstripanybiologicalstri-ng anybi-logicalstrip anyproteincanbealigned -an-yprote--i-ncanb-----ealigned

  12. Statistical approaches vs. accuracyHow far may they be improved? Protein secondary structure prediction – accuracy 70-72%(not much changed since 1978) 100% accuracy requires the complete database for all possible structures. For 30 AA polypeptides – 2030sequences/secondary structures Searching the database for appropriate sequence/structure with the rate 1012 sequences/sec. would proceed 1.8 bilion times longer than the age of the Universe.

  13. Genetic conditioning of the amino acid replacement probabilities and spectrum in molecular evolution

  14. The Markov model assumes that the substitution probability of amino acid AA1 by AA2 is the same, regardless of what the initial residue AA1 was transformed from (AAx,AAy) Pa AAx AA1 AA2 Pb AAy AA1 AA2 Pa = Pb The currently used statistical algorithms are based on Markovian model of the amino acid replacement (they directly use stochastic matrices of replacement frequency indices)

  15. BLOSUM62 matrix of amino acid replacements Why tryptophane is here the most conservative residue?

  16. Replacemant Arg  Lys according to the statistical interpretation using stochastical matrix indices Arg Lys

  17. Met Lys ATG AAG Gln CAR Leu Lys CTR AAR Arg AGR Ser AGY His Arg Lys CAY AGR AAR Arg CGR Arginine-to-lysine mutational replacements

  18. Possible one-point-mutational processing of serine with respect to its origin Trp Asn UGG AAU Ser Ser UCG AGU Thr Ala Pro Thr Ile Asn Ser Trp Leu Ser Arg Cys (UAG) Gly

  19. Is arginine the same as arginine? Possible codons for arginine: AGA AGG CGA CGG CGC CGT

  20. Diagram of amino acid genetic relationships Diagram of codon genetic relationships

  21. R R Genetic relationships between Arg and Met/Gln Q K E – Q K E – N D H Y N D H Y R G – R G W S G R C S G R C T A P S T A P S T A P S T A P S I V L L M V L L I V L F I V L F

  22. AlaGCG ValGUG What part of the codon contains the information about the previous amino acid that occurred at certain position of the protein sequence? At most 2/3 of the entire codon.

  23. ValGUG MetAUG IleAUA SerUCU ThrACU SerAGU AsnAAC AspGAC HisCAC GlnCAG GluGAG AspGAU TyrUAU HisCAU AsnAAU LysAAG GlnCAG HisCAC . . . How long is the information about codons of preceeding amino acids stored? The shortest storage period is 3 transitions/transversions AlaGCG SerUCC Theoreticaly the longest period is infinite LysAAA

  24. Correlated mutations The phenomenon of several mutations occurring simultaneously and dependent on each other According to the current hypothesis of molecular positive Darwinian selection, correlated mutations are related to the changes occurring in their neighborhood,they reflect the protein-to-protein interactionand theypreserve the biological activity and structural properties of the molecule

  25. The current explanation of correlated mutations occurrence (example)

  26. The three types of distribution of correlated positions present in myoglobins The residue location and relative distribution is shown on tertiary structure of human myoglobin (P0244, pdb1bzp) Position no. and occurring residues Correlation versus position 127 127 [AMSTV] A (58) S (7) 27 [ADEFLNT] ADEFNT E 31 [GKRS] GKRS R 78 [AKLQ] K ALQ 109 [DEGNT] DEGT E 116 [AEHKQST] AEHKQS A 117 [AEKNQS] AEKQS E 122 [BDEN] BDEN D The spot correlation cluster

  27. The three types of distribution of correlated positions present inBowman-Birk inhibitor family The residue location and relative distribution is shown on tertiary structure of Bowman-Birk inhibitor from soybean (P01055) Position no. and occurring residues Correlation versus position 13 13 [–ADFIKLMPRSTV] L (11) M (10) A (8) 4 [–RSTVY] V –S S 5 [–KPST] K –S S 7 [AEGKP] A P P 11 [EFHIKLQRST] T EHQ S 21 [EFIKMQT] T Q EQ The narrow correlation cluster

  28. The three types of distribution of correlated positions present in eglin-like proteins.The residue location and relative distribution is shown on tertiary structure of eglin C (P01051) Position no. and occurring residues Correlation versus position 67 67 [–DGNT] D (8) G (9) 10 [–ELNQRST] ET LNQRS The dispersed correlation

  29. The three types of distribution of correlated positions present in lysozymes The residue location and relative distribution is shown on tertiary structure of lysozyme from rat (P00697, pdb5lyz) Position no. and occurring residues Correlation versus position 80 80 [GHKNR] G (7) H (31) N (16) 30 [ILMV] MV ILMV V 40 [DFKNR] DN N FKNR The dispersed correlation

  30. The observed number and contribution of three correlation types in four different protein families The correlation sets consist of 2 to over 20 residues The protein family (number of correlated positions/set) The correlation statistics Total number of correlation sets observed Number of dispersed sets Number of narrow clusters Number of undirected clusters Number of sets related to active center Eglin-like proteins (2-13) 20 7 7 6 1 Bowman-Birk proteinase inhibitors (2-28) 23 4 13 6 9 Myoglobins (2-29) 41 23 9 9 n.a. Lysozymes (2-15) 41 25 9 7 9 All families 125 (100%) 59 (47.2%) 38 (30.4%) 28 (22.4%) -

  31. A mathematician – biologist dialogueThe communication problem Bowls are convex Bowls are concave

  32. In entire splendour of natural phenomena... ...not always the first conclusion is correct and the first impression consistent with the reality

  33. Thank you for your attention!!!

More Related