320 likes | 455 Vues
This guide provides a workflow for performing multiple sequence alignment of fertilization proteins from various abalone species using ClustalW and T-Coffee. Start by downloading the FASTA sequences from HomoloGene. Use ClustalW for initial alignment, and then apply T-Coffee for improved alignment accuracy. This process includes submission steps, options for generating guide trees, and the transformation of protein alignments into codon alignments for selection pattern studies. Get the sequences and explore evolutionary relationships among these proteins.
E N D
Multiple Sequence Alignment ClustalW TCoffee Ka, Ks, and Ka/Ks Anchored alignment
ClustalW • http://www.ebi.ac.uk/clustalw/
ClustalW Paste your sequences Multiple sequence Alignment alignment options Submit
Exercise • HomoloGene is a system for automated detection of homologs among annotated genes of several completely sequenced eukaryotic genomes. • Download the FASTA sequences of HomoloGene:5276 and align them with ClustalW
Result Alignment Guide Tree
TCoffee http://tcoffee.crg.cat/ Tcoffee computes its alignments by combining a collection of smaller alignments
Alignment at the DNA level based on an alignment at the Protein Level • The 18-kDa protein plays an important role in fertilization of several abalone species • Build a multiple sequence alignment using the following sequences
Sequences >gi|604533|gb|AAC37231.1| fertilization protein MRSLVLLCVLLMAICAADKKTSVSKENEAAMKVAMMKFLDMKAGVFKEIIEDMGYPITPPQWTTLLYYNR ERLIEFCRSFLALSKKIILLGGNKLNKANFARMGRILGWKSQWAVRQRQWGMVRVSRRHTSTAIAKRIVA MKVADLPCN >gi|604531|gb|AAC37233.1| fertilization protein MRFLLLLCVLMGAVSQAVCRKRPNVWGKIVVKEKNKAAMKIGFMEYLDAKLVKFKRHWLVGANWKLQKFE TDEMRYLAIKRLIKVCHGYTIWSQRLIMLKYRPLNEKYFKKVGRYLAWRNYLIVFRMWIGVLKKNLKRSE ITKPMQKLLDTKDGELPCPVRKIHG >gi|604529|gb|AAC37232.1| fertilization protein MRSLVLLCVLMAVGCVAFDDVVVSRQEQSYVQRGMVNFLDEEMHKLVKRFRDMRWNLGPGFVFLLKKVNR ERMMRYCMDYARYSKKILQLKHLPVNKKTLTKMGRFVGYRNYGVIRELYADVFRDVQGFRGPKMTAAMRK YSSKDPGTFPCKNEKRRG >gi|604527|gb|AAC37230.1| fertilization protein MRSLVLLCVLLMAICAADKKTTVSKENAAAMKIAMIKFLDARAGKFKKRVENMGYPITPPQWTTLLYYNR QRLMEWCHTYVEFSKKIILMGGNKLNKKNFTRMGRIIGWKNQWVLKRRQWEMVRVMRRYKSTAIAKKIVA MKVADLPCN >gi|604525|gb|AAC37229.1| fertilization protein MRSLVLLCVLLMAICAADKKSTVSKENAAAMKVAMIKFLDSRTDRFKKRIEKIGYPITPPQYTTLLYYNR ERLMDWCHNYVEVSKKIILLGGNKLNKKNFARMGRIIGWKNQWILKRRQWHMVRVMRRYKASAIAKKIVA MKVADLPCN
Choose TCoffee Regular, paste the sequences in the data box, and press submit
Download formats Guide tree
Codon Alignment • In order to study selection patterns, you will need to have the corresponding DNA alignment • Using the PROTOGENE (Protein-to-Gene) in Tcoffee, the amino-acid alignment will be transformed into a codon alignment. The actual procedure invloves tBLASTn.
PROTOGENE (in Tcoffee) is time consuming. Please submit your email address, and the results will be emailed to you. • PROTOGENE may return more that one DNA sequence for any given Protein sequence. For your homework assignment, please choose one sequence for each species.
(Result) Codon alignment >gi|604533|gb|AAC37231.1|_G_L36554 _S_ AAC37231 _DESC_ fertilization protein MATCHES_ON Haliotisassimilis fertilization protein mRNA, complete cds ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC------ ------------------AAAAAAACCTCGGTCTCGAAGGAAAATGAAGCCGCAATGAAG GTAGCGATGATGAAGTTTTTGGATATGAAGGCGGGTGTATTCAAAGAAATC---ATTGAG GATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTGTACTACAACAGAGAG AGATTGATTGAATTTTGCCGTTCCTTCCTTGCATTGTCCAAAAAGATTATATTGCTGGGA GGTAACAAATTAAATAAGGCGAACTTCGCTAGGATGGGTCGAATCCTTGGCTGGAAAAGC CAGTGGGCTGTGAGACAGAGGCAATGGGGGATGGTCAGA---------GTGTCGAGGCGC CATACAAGTACTGCAATAGCTAAAAGGATCGTCGCCATGAAAGTTGCTGACCTACCCTGT AAC------------------TAG >gi|604531|gb|AAC37233.1|_G_L36590 _S_ AAC37233 _DESC_ fertilization protein MATCHES_ON Haliotiscorrugata fertilization protein mRNA, complete cds ATGAGGTTTTTGCTGCTTCTCTGTGTTTTGATGGGGGCAGTATCTCAGGCAGTATGCAGA AAAAGACCTAATGTCTGGGGGAAAATCGTGGTCAAGGAGAAAAATAAAGCCGCAATGAAG ATAGGGTTTATGGAATATTTGGATGCAAAGTTGGTAAAGTTTAAAAGGCACTGGCTTGTT GGAGCCAATTGGAAACTTCAAAAATTTGAAACGGATGAAATGAGATACCTCGCCATAAAG AGACTGATAAAAGTTTGCCATGGATACACTATTTGGTCCCAACGACTAATAATGTTAAAA TATCGACCATTGAATGAGAAATACTTCAAAAAGGTGGGTCGATACCTTGCCTGGCGAAAC TACCTCATAGTTTTTCGGATGTGGATCGGCGTTTTG------AAGAAAAATCTTAAAAGA TCGGAAATAACGAAACCCATGCAAAAACTCCTCGACACAAAGGATGGTGAGTTGCCCTGC CCTGTTAGAAAGATACATGGATAA >gi|604529|gb|AAC37232.1|_G_L36589 _S_ AAC37232 _DESC_ fertilization protein MATCHES_ON Haliotisfulgens fertilization protein mRNA, complete cds ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGATGGCGGTAGGATGTGTGGCGTTT------ ------------------GATGATGTGGTGGTCTCAAGGCAAGAGCAATCTTATGTGCAG AGAGGGATGGTCAACTTTTTGGATGAAGAAATGCATAAACTGGTTAAACGG---TTTAGA GATATGCGATGGAATTTAGGGCCAGGCTTTGTATTCCTTCTAAAAAAAGTCAACAGAGAG AGAATGATGCGCTACTGCATGGATTACGCCAGATATTCCAAAAAGATTTTACAGCTAAAA CATCTTCCAGTAAATAAGAAGACCCTCACTAAAATGGGTAGATTCGTTGGATATCGAAAC TATGGGGTCATCAGGGAGTTGTACGCCGACGTATTCAGAGACGTTCAAGGATTTAGGGGG CCTAAAATGACTGCAGCCATGAGGAAGTACAGCAGCAAGGATCCTGGTACATTTCCTTGC AAGAACGAGAAACGCCGCGGATGA >gi|604527|gb|AAC37230.1|_G_L36553 _S_ AAC37230 _DESC_ fertilization protein MATCHES_ON Haliotissorenseni fertilization protein mRNA, complete cds ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC------ ------------------AAAAAAACCACGGTCTCGAAGGAAAATGCAGCCGCAATGAAG ATAGCTATGATAAAGTTTTTGGATGCGAGGGCGGGTAAATTCAAAAAACGC---GTTGAG AATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTATACTACAACAGACAG AGATTGATGGAATGGTGCCATACCTACGTTGAATTTTCCAAAAAGATTATATTGATGGGA GGTAACAAATTAAATAAGAAGAACTTCACTAGGATGGGTCGAATCATTGGCTGGAAAAAC CAGTGGGTTTTGAAAAGGAGGCAATGGGAGATGGTCAGA---------GTGATGAGGCGC TATAAAAGTACTGCAATAGCTAAAAAGATCGTCGCCATGAAAGTTGCTGACCTACCCTGT AAC------------------TAG >gi|604525|gb|AAC37229.1|_G_L36552 _S_ AAC37229 _DESC_ fertilization protein MATCHES_ON Haliotisrufescens fertilization protein mRNA, complete cds ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC------ ------------------AAAAAATCCACGGTCTCGAAGGAAAATGCAGCCGCAATGAAG GTAGCGATGATAAAGTTTTTGGATTCGAGGACGGATAGATTCAAAAAACGC---ATTGAG AAGATTGGATATCCAATAACCCCTCCGCAATATACAACTCTACTATACTACAACAGAGAG AGATTGATGGATTGGTGCCATAACTACGTTGAAGTATCCAAAAAGATTATATTGTTGGGA GGTAACAAATTAAATAAGAAGAACTTCGCTAGGATGGGTCGAATCATTGGCTGGAAAAAC CAGTGGATTTTGAAAAGGAGGCAATGGCACATGGTCAGA---------GTGATGAGGCGC TATAAAGCTTCTGCAATAGCTAAAAAGATCGTCGCCATGAAAGTTGCTGACCTACCCTGT AAC------------------TAG
SNAP - Ds/Dn Calculation Tool http://hcv.lanl.gov/content/sequence/SNAP/SNAP.html Calculates synonymous and nonsynonymous substitution rates based on codon alignments according to Nei and Gojobori (1986) method.
Input codon alignment Select output statistics
SNAP - Ds/Dn Calculation Tool Conclusion: We detect positive selection in six of the comparisons. So did Swanson and Vacquier (1998).
Distmat http://emboss.bioinformatics.nl/cgi-bin/emboss/distmat Distmat calculates the evolutionary distances between every pair of sequences in a multiple alignment. The distances are expressed in terms of the number per 100 nucleotides or number of replacements per 100 amino acids
Distmat • Feed the DNA alignment of 18-kDa protein into distmat. • Calculate separately the distances between the sequences for codon positions 1 and 2, and for codon position 3. • Are the results in agreement with those from the dn/ds analysis?
Anchored multiple-sequence alignment with DIALIGN http://dialign.gobics.de/anchor/submission.php User manual: http://dialign.gobics.de/anchor/manual
Align the following sequences (use the file dalign_sequences.txt): >seq1 WKKNADAPKRAMTSFMKAAY >seq2 WNLDTNSPEEKQAYIQLAKDDRIRYD >seq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK
Results • DIALIGN makes alignments from fragments
Results • Numbers below the alignment reflect some rough degree of local similarity among the sequences
Anchored alignment • Now, let us assume that the user has some expert knowledge concerning a certain domain that is present in all the input sequences • The domains marked in red in the three sequences are thought to be homologous to one another >seq1 WKKNADAPKRAMTSFMKAAY >seq2 WNLDTNSPEEKQAYIQLAKDDRIRYD >seq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK
Therefore, the user wants to define this domain as anchor and align the rest of the sequences automatically. • To specify a set of anchor points, each anchor point corresponds to a equal-length segment pair involving two of the input sequences should be defined
first sequence involved • second sequence involved • start of anchor in first sequence • start of anchor in second sequence • length of anchor
Results • The specified domain is aligned and the remainder of the sequences is aligned automatically respecting the constraints given by the anchor points:
>seq1 WKKNADAPKRAMTSFMKAAY >seq2 WNLDTNSPEEKQAYIQLAKDDRIRYD >seq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK >seq4 WRMDSNQKNPNNPKAAYNKGDANAPK