1 / 57

Genetic semihomology algorithm

Genetic semihomology algorithm. A new approach to the comparative analysis of protein sequences.

trula
Télécharger la présentation

Genetic semihomology algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genetic semihomology algorithm A new approach to the comparative analysis of protein sequences It is admitted that from evolutionary point of view the genetic code and amino acid ‘language’ have evolved simultaneously. They also act with strict coherence with each other. Therefore, in analysis of protein differentiation and variability, both levels should be considered simultaneously.

  2. The tools currently used for comparative sequence analysis apply the Markovian model of amino acid replacement and are based on stochastic matrices of the observed amino acid substitution frequencies

  3. BLOSUM62 matrix of amino acid replacements

  4. BLAST protein search - output BLASTP 2.2.2 [Dec-14-2001]

  5. Genetic semihomology algorithm The aim of the new algorithm elaboration is to overcome the basic disadvantages of protein sequence analysis tools and to exclude some basic errors in the assumptions of the existing statistical methods. It is to be able to explain the mechanism and pathway of protein evolution and differentiation, not only limited to the description of the initial and final step of the observed changes.

  6. Genetic semihomology algorithm Another goal of this algorithm is to make it applicable to any group of proteins of any nature, function and location. It can be achieved for two reasons: 1) minimization of basic assumptions limited to the general amino acid: codon translation table and assuming that single point mutation is a principle, most common, mechanism of protein variability; 2) non-statistical approach (no stochastic matrices)

  7. Genetic semihomology algorithm The algorithm of genetic semihomology assumes the close relation between the compared amino acids and their codons in related proteins. The algorithm is based on the network of genetic relationship between amino acids. Such assumption makes the same residues at different positions of the sequence unequal with respect to their changeability.

  8. Genetic semihomology algorithm The algorithm assumes that the basic mechanism of differentiation among related proteins consists in the single point mutation. The general part of the algorithm is the three-dimensional diagram reflecting the network of genetic relationship between amino acids

  9. Diagram of amino acid genetic relationships Diagram of codon genetic relationships

  10. Semihomology algorithmInput requirements The minimum data required for starting analysis with the algorithm of genetic semihomology are the protein sequences (at least two). The more sequences are used for analysis and multiple alignment construction - the more concise and accurate the results are. Although the nucleotide sequences of the genes are not necessarily required, it is very helpful if the nucleotide sequences are known at least for some of the analyzed proteins. That increases significantly the amount of the information accessible from such analysis. Also the results are the best for sequences revealing sufficiently high degree of diversity.

  11. GTT AAT TGC AGC CTG TAT GCC AGC GGC ATC GGC AAG GAT GGG ACG AGT TGG GTA GCC 1) V N C S L Y A S G I G K D G T S W V A ATT GAT TGC TCT CCG TAC CTC CAA GTT GTA AGA GAT GGT AAC ACC ATG GTA GCC 2) I D C S P Y L Q - V V R D G N T M V A UNITARY MATRIX % V N C S L Y A S G I G K D G T S W V A SCORE I D C S P Y D G N T M V A <L Q V V R> 0 0 1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 1 1 7/19 36.8 GENETIC CODE MATRIX GTT AAT TGC AGC CTG TAT GCC AGC GGC ATC GGC AAG GAT GGG ACG AGT TGG GTA GCC ATT GAT TGC TCT CCG TAC CTC GTT GTA AGA GAT GGT AAC ACC ATG GTA GCC < CAA > 2 2 3 0 2 2 1 0 0 1 1 1 3 2 1 1 1 3 3 29/57 50.9 PAM250 SCORING 42/97 43.3 V N C S L Y A S G I G K D G T S W V A 42/89 47.2 I D C S P Y L V V R D G N T M V A < Q > 1 1 2 2 0 2 0 0 0 1 0 1 2 2 1 1 0 2 2 20/38 52.6 GENETIC SEMIHOMOLOGY V N C S L Y A S G I G K D G T S W V A < Q > I D C S P Y L V V R D G N T M V A 2 2 3 3 2 3 0 0 0 2 1 2 3 3 1 1 0 3 3 34/57 59.6 Comparison of the fragments of 1st and 2nd domain of chicken ovomucoid using unitary matrix, GCM, PAM250 and algorithm of genetic semihomology

  12. Semihomology algorithmAdvantages of the approach The results obtained by using this method are more comprehensive than those of the methods used currently and reflect the actual mechanism of protein differentiation and evolution. They concern: 1) location of homologous and semihomologous sites in compared proteins, 2) precise estimation of gap location in non-identical fragments of different length, 3) analysis of internal homology and semihomology, 4) precise location of domains in multidomain proteins, 5) estimation of genetic code of non homologous fragments,

  13. Semihomology algorithmAdvantages of the approach 6) construction of genetic probes, 7) studies on differentiation processes among related proteins, 8) estimation of the degree of relationship among related proteins, 9) studies on the evolution mechanism within homologous protein families, 10) confirmation of the actual relationship between sequences revealing low degree of identity/similarity.

  14. Semihomology algorithmAdvantages of the approach • Application of the semihomology approach has led to discovery and describing some important mechanisms affecting the protein evolution and differentiation. The most important are: • the mechanism and role of cryptic mutations at unusually variable positions (Leluk; 2000b-c) • 2) the phenomenon of very long distance (dispersed) mutational correlation within sets of variable positions (Leluk 2000a; Leluk and Grabiec, 2001; Leluk et al., 2001b; Leluk et al., 2002)

  15. Semihomology algorithmLimitations of the approach The limitations in use of the semihomology approach appear in case if: - there are too few sequences taken for analysis - the identity degree among the sequences is too high (too low diversity) - the long fragments of the compared sequences show no identity at all (e.g. N-terminal signal fragments of homologous proteins)

  16. The analysis of genetic semihomology excludes applicability of Markov model for the studies on protein variability at the amino acid level. The amino acid codons do contain the information about the „ancestral” amino acids, whose codons were the starting point to the codon of current residue. It refers mainly to the positions undergoing single-point mutations as the most basic mechanism of evolutionary variability.

  17. GEISHA Software based on genetic semihomology algorithm - Protein sequence similarity and homology analysis. - Multiple alignment construction on the basis of genetic relationships between amino-acids. - Analysis of variability within homologous protein families - Molecular phylogenetic studies

  18. Software based on genetic semihomology algorithm FQS Semihomology toolhttp://www.fqspl.com.pl/sh/

  19. 19. DEGQK 25. NHEQDS 7. LYW 14. RSDA The semihomologous correlation between amino acids occurring at selected non-homologous positions of inhibitors from squash seeds. The solid lines indicate the transition type of semihomology, the dashed lines refer to transversion type of semihomology.

  20. Application of genetic semihomology algorithm for identifying the fragments of possible different mechanism than single point mutation

  21. Dot matrix pairwise alignment Noise reduction

  22. Dot matrix pairwise alignment Internal homology (gene multiplication) SEMIHOM BLAST 2 SEQUENCES Chicken ovoinhibitor precursor (7 domains) Chicken ovomucoid precursor (3 domains)

  23. Dot matrix comparison of selected homologous Kazal inhibitors ovoinhibitor-ovoinhibitor ovoinhibitor-ovomucoid ovoinhibitor-PSTI BLASTP 2.0.9 (Blosum62) SEMIHOM (algorithm of genetic semihomology)

  24. Raw dot matrix Noise filtering marking of whole fragments adding semihomology Consecutive steps of dot matrix results obtained from comparison of chicken ovoinhibitor with itself by program SEMIHOM

  25. Precise gap location in non-identical fragments of different length by using the genetic semihomology algorithm. The compared proteins are trypsin inhibitors from squash seeds - CPGTI-I (horizontal) and CSTI-IIb (vertical). identities and semihomology The analysis of semihomology shows the genetic relationship between His25 and Asp25, therefore a gap in CPGTI-I should be located next to Ile26 in CSTI-IIb. Window size = 10; minimum number of homologous positions in window = 4. ? RVCPKILMECKKDSDCLAECICLEH-GYCG MVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS RVCPKILMECKKDSDCLAECICLEH-GYCG MVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS ? RVCPKILMECKKDSDCLAECICLE-HGYCG MVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS identities only CPGTI-I has a non homologous His25 while CSTI-IIb possesses at relative site two non homologous residues Asp25 and Ile26.

  26. Multiple alignment

  27. Multiple alignment of seven chicken ovoinhibitor domains obtained with Markovian and non-Markovian methods

  28. Comparison of multiple alignment consensus results for three inhibitor families achieved with ClustalW, Multalin (both applied BLOSUM62) and algorithm of genetic semihomology Consensus alignment results for Bowman Birk inhibitors 1 10 20 30 40 50 60 ClustalW .. ...**: * ** * ** * * *: ::*** *. * *: * * :* * * .*** *: Multalin .d....aCC#.C.CTkS.PPqCrC.Dir.#tCHSaCksCiCtrS.PpqCrC.Dtt.FCYk.C. GS AA S.s.SSS**S.*S**s*S**.*S*S*SS*.S***S*.S*s*Ss*s*SS*s*s*SSS***ss*S GS NA sss.SSSS*S.*.**.*SS*sSs*s*SS*sSSS*SSssSsSSsSsSSSS.SsSSsS*SSs.Ss Consensus alignment results for squash inhibitors 1 10 20 30 ClustalW ** * * : ** * * ** Multalin ....r.CPrIlm.Ck.DsDCla.C.Cl...G.CG.. GS AA SS**S*sSS*SsSs**SSS*S*SSS.SS** Consensus alignment results for ovoinhibitor domains (Kazal type) 1 10 20 30 40 50 60 ClustalW .* : *. *.::. ** . * :* : : . * Multalin .dcs.y......dg...vaCp.il.pvCgt#gvTYsneC..Cahn.#..t...k..dg.C...... GS AA SS*sSSSSSSSSS*SSSS.*Sss..ss*sSSSS**sSs*sS*.sSSSsSSsSS.sssSS*SSSsss GS NA sSSs.S.S....sSSssss*Sss..ssssSSSSSSsSsS.sS.sS.S...sSSs..sSSSssss.s

  29. The dot matrix comparison of human erythrocyte alpha-spectrin with itselfFor the best visualization of the repeats, the identity threshold is set as 15, and frame size as 75

  30. The multiple alignment of proteins homologous to the 10th segment of human erythrocyte -spectrin

  31. The multiple alignment of human erythrocyte α-spectrin 23 segments achieved with the MultAlin (BLOSUM62) method and genetic semihomology approach

  32. The dot matrix comparison of human erythrocyte α-spectrin with the consensus of 106-residue repeats The consensus achieved with genetic semihomology approach [Leluk, 1998]. The identity threshold and frame size are set as 8 and 40 respectively. The identity and genetic semihomology of the compared residues is visualized The consensus described by Sahr et al. [1990]. The identity threshold and frame size are set as 8 and 40 respectively.

  33. The use of α-spectrin consensus sequence achieved with different algorithms for sequence similarities search (BLAST) Query sequence: α-Spectrin segment consensus sequence obtained with MultAlin program (BLOSUM62) Score E-value Sequences producing significant alignments: (bits)sp|P02549|SPCA_HUMAN SPECTRIN ALPHA CHAIN, ERYTHROCYTE 17 21352 sp|Q01082|SPCO_HUMAN SPECTRIN BETA CHAIN, BRAIN (SPECTRIN,...16 27970 sp|P08032|SPCA_MOUSE SPECTRIN ALPHA CHAIN, ERYTHROCYTE 16 27970 sp|P07751|SPCN_CHICK SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 16 27970 sp|Q13813|SPCN_HUMAN SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN... 16 27970 sp|Q62261|SPCO_MOUSE SPECTRIN BETA CHAIN, BRAIN (SPECTRIN, ... 16 27970 sp|P16086|SPCN_RAT SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN, N... 16 36639 sp|Q00963|SPCB_DROME SPECTRIN BETA CHAIN 16 47996 sp|P13395|SPCA_DROME SPECTRIN ALPHA CHAIN 15 62874 sp|P16546|SPCN_MOUSE SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 14 141334 sp|P15508|SPCB_MOUSE SPECTRIN BETA CHAIN, ERYTHROCYTE 14 185142 sp|P11277|SPCB_HUMAN SPECTRIN BETA CHAIN, ERYTHROCYTE 14 185142 sp|P39254|Y04O_BPT4 HYPOTHETICAL 36.3 KD PROTEIN IN NRDC-M... 11 935541

  34. The use of α-spectrin consensus sequence achieved with different algorithms for sequence similarities search (BLAST) Query sequence: α-Spectrin segment consensus sequence attained by Sahr et al. [1990] Score E-value Sequences producing significant alignments: (bits) sp|P13395|SPCA_DROME SPECTRIN ALPHA CHAIN 43 2e-04 sp|Q13813|SPCN_HUMAN SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN... 40 0.001 sp|P07751|SPCN_CHICK SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 39 0.003 sp|P16546|SPCN_MOUSE SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 39 0.004 sp|P16086|SPCN_RAT SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN, N... 37 0.012 sp|P02549|SPCA_HUMAN SPECTRIN ALPHA CHAIN, ERYTHROCYTE 37 0.016 sp|P08032|SPCA_MOUSE SPECTRIN ALPHA CHAIN, ERYTHROCYTE 36 0.021 sp|P15508|SPCB_MOUSE SPECTRIN BETA CHAIN, ERYTHROCYTE 35 0.048 sp|Q00963|SPCB_DROME SPECTRIN BETA CHAIN 35 0.063 sp|Q01082|SPCO_HUMAN SPECTRIN BETA CHAIN, BRAIN (SPECTRIN,... 34 0.082 sp|Q62261|SPCO_MOUSE SPECTRIN BETA CHAIN, BRAIN (SPECTRIN, ... 34 0.11 sp|P11277|SPCB_HUMAN SPECTRIN BETA CHAIN, ERYTHROCYTE 32 0.32 sp|P05095|AACT_DICDI ALPHA-ACTININ 3, NON MUSCULAR (F-ACTIN... 21 797 sp|P34367|YLJ2_CAEEL HYPOTHETICAL 256.3 KD PROTEIN C50C3.2 ... 20 1791 sp|Q03001|BPA1_HUMAN BULLOUS PEMPHIGOID ANTIGEN 1 (BPA) (H... 19 3073 sp|P30427|PLEC_RAT PLECTIN 19 4026 sp|P31670|GT27_FASHE GLUTATHIONE S-TRANSFERASE 26 KD 47 (GS... 19 4026 sp|P46125|YEDI_ECOLI HYPOTHETICAL 32.2 KD PROTEIN IN DSRB-V... 18 6908 sp|P42094|PHYT_BACSU 3-PHYTASE PRECURSOR (PHYTATE 3-PHOSPHA... 18 6908 sp|P56288|E2BG_SCHPO PROBABLE TRANSLATION INITIATION FACTOR... 18 9049 sp|O00273|DFFA_HUMAN DNA FRAGMENTATION FACTOR ALPHA SUBUNI... 18 9049 sp|P12311|ADH1_BACST ALCOHOL DEHYDROGENASE (ADH-T) 18 9049

  35. The use of α-spectrin consensus sequence achieved with different algorithms for sequence similarities search (BLAST) Query sequence: α-Spectrin segment consensus sequence attained by genetic semihomology algorithm Score E-value Sequences producing significant alignments: (bits)sp|P13395|SPCA_DROME SPECTRIN ALPHA CHAIN 49 3e-06 sp|Q13813|SPCN_HUMAN SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN... 47 1e-05 sp|P07751|SPCN_CHICK SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 42 3e-04 sp|P16086|SPCN_RAT SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN, N... 41 6e-04 sp|P02549|SPCA_HUMAN SPECTRIN ALPHA CHAIN, ERYTHROCYTE 38 0.005 sp|Q00963|SPCB_DROME SPECTRIN BETA CHAIN 37 0.014 sp|P08032|SPCA_MOUSE SPECTRIN ALPHA CHAIN, ERYTHROCYTE 36 0.019 sp|P15508|SPCB_MOUSE SPECTRIN BETA CHAIN, ERYTHROCYTE 36 0.024 sp|P16546|SPCN_MOUSE SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 36 0.032 sp|Q01082|SPCO_HUMAN SPECTRIN BETA CHAIN, BRAIN (SPECTRIN,... 34 0.094 sp|Q62261|SPCO_MOUSE SPECTRIN BETA CHAIN, BRAIN (SPECTRIN, ... 34 0.094 sp|P11277|SPCB_HUMAN SPECTRIN BETA CHAIN, ERYTHROCYTE 33 0.21 sp|P34367|YLJ2_CAEEL HYPOTHETICAL 256.3 KD PROTEIN C50C3.2 ... 20 1195 sp|Q99001|AACB_CHICK ALPHA-ACTININ, BRAIN ISOFORM (F-ACTIN ... 20 1566 sp|P35609|AAC2_HUMAN ALPHA-ACTININ 2, SKELETAL MUSCLE ISOF... 20 1566 sp|P12814|AAC1_HUMAN ALPHA-ACTININ 1, CYTOSKELETAL ISOFORM ... 20 1566 sp|P05094|AACT_CHICK ALPHA-ACTININ, SMOOTH MUSCLE ISOFORM (... 20 1566 sp|P30427|PLEC_RAT PLECTIN 19 2687 sp|P20111|AACS_CHICK ALPHA-ACTININ, SKELETAL MUSCLE ISOFORM... 19 3520 sp|P47493|SYG_MYCGE GLYCYL-TRNA SYNTHETASE (GLYCINE--TRNA L... 18 4611 sp|Q03001|BPA1_HUMAN BULLOUS PEMPHIGOID ANTIGEN 1 (BPA) (H... 18 7913

  36. Kinase project Multiple alignment of hexokinase domains HEXOKINASE_I (2 domains)HEXOKINASE_II (2 domains) HEXOKINASE_III (2 domains) HEXOKINASE_IV (1 domain)

  37. Kinase project • Complex comparative studies at the primary structure level • Construction of molecular phylogenetic trees • Studies on sequence/structure/function relationship • Studies on the mechanisms of correlated mutations and variability • Genetic principles of differentiation within this protein family

  38. Simplified (planar) diagram of genetic relationships between amino acids In planar diagram the encoding role of the third codon position is ignored. Only first two codon positions are taken into account.

  39. Simplified (planar) diagram of genetic relationships between amino acids The simplified planar diagram emphasizes the special encoding character of six-codon amino acids – Leu, Arg and Ser. The six-codon amino acids may play the role the of „mutational passages” that are not liable to the selection restrictions. Conclusions? These amino acids may influence on the variability range increase. In fact the six-codon amino acids occur unusually frequent at very variable positions. This concerns especially serine, and to lesser extent – arginine. Leucine does not show the correlation between the frequency of occurrence and variability range.

  40. Frequency of six-codon amino acids as a function of position variability in randomly selected proteins of different origin and nature The results for 2686 residues at 606 corresponding positions

  41. Phylogenetic tree verification The ovoinhibitor domains homology comparison. The similarity scores (%) for the aligned DNA-coding sequences are shown above the diagonal. Below diagonal are the similarity scores for the aligned amino acid sequences. The results obtained by Scott et al. Scott M. J., Huckaby C. S., Kato I., Kohr W. J., Laskowski M. Jr, Tsai M.-J. and O'Malley B. W. (1987) J. Biol. Chem. 262, 5899-5907 The results obtained by semihomology analysis

  42. Virtual reverse translation Deducing the genetic code QRCRRDSDCKKCRMDSDC Arg codons: AGA AGG CGA CGG CGC CGT Met codons: ATG

  43. Virtual reverse translation Deducing the genetic code QRCRRDSDCKKCRMDSDC Arg codons: AGA AGG CGA CGG CGC CGT Met codons: ATG

  44. Virtual reverse translation

  45. K E Q – K E Q – N D H Y N D H Y R G R – AGCT 1 R G R W S G R C S G R C 3 2 T A P S T A P S T A P S T A P S I V L L V L L I V L F I V L F Virtual reverse translation M

  46. K E Q – K E Q – N D H Y N D H Y AGA G CGA – AGCT 1 AGG G CGG W S G CGC C S G CGT C 3 2 T A P S T A P S T A P S T A P S I V L L V L L I V L F I V L F Virtual reverse translation ATG

  47. K E Q – K E Q – N D H Y N D H Y AGA CGA G – AGCT 1 CGG G W S CGC G C S CGT G C 3 2 T A P S T A P S T A P S T A P S I V L L V L L I V L F I V L F Virtual reverse translation AGG ATG

  48. R R Genetic relationships between Arg and Met/Gln Q K E – Q K E – N D H Y N D H Y R G – R G W S G R C S G R C T A P S T A P S T A P S T A P S I V L L M V L L I V L F I V L F

  49. Genetic semihomology prediction of genetic code for chicken ovoinhibitor domains. The predicted gene sequence is compared with the known DNA sequence. Only those positions where prediction reduces possible codons are considered. The predicted codons that are consistent with those present in the ovoinhibitor gene are shadowed.

  50. References (1) Leluk, J., Pham, T.-C. (1985) Calculation and analysis of protein sequence homology using microcomputer ZX Spectrum. VIIth Polish Conferrence "Chemistry of Amino Acids and Peptides", Abstracts, 83 Kubiak, Z.J., Leluk, J. (1986) Application of the ZX Spectrum microcomputer for prediction of the secondary structure of proteins. Pr. Nauk. Inst. Chem. Nieorg. Metal. Pierwiastków Rzadkich. Wrocław 55, 186-189 Leluk, J. (1993) Analysis of protein primary structure using program HOMOLOGY. XXIXth Meeting of Polish Biochemical Society, Abstracts, 384 Leluk, J., Krowarsch, D. (1994) A new algorithm for analysis the homology in protein primary structure, 3rd Conferrence "Computers in Chemistry", Wrocław, Poland, Abstracts, 60. Leluk, J. (1994) Application of program SEMIHOM based on a new algorithm for homology analysis of protein primary structure. Ist Conference "Computer-based Scientific Research", Wrocław, Poland, Abstracts, 189-194. . Leluk, J. (1996) Comparison of the algorithm of genetic semihomology with currently applied algorithms for protein homology analysis. IIIrd Conferrence "Computer-based Scientific Research", Wrocław, Polanica-Zdrój, Abstracts, 53-58.

More Related