1 / 19

MBoMS Genomics of Model Microbes Lab 4: Multiple sequence alignment

MBoMS Genomics of Model Microbes Lab 4: Multiple sequence alignment. Multiple Sequence Alignments. Your next task will be to create multiple sequence alignments This is one of the most challenging steps in the study of molecular evolution

juro
Télécharger la présentation

MBoMS Genomics of Model Microbes Lab 4: Multiple sequence alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MBoMS Genomics of Model Microbes Lab 4: Multiple sequence alignment

  2. Multiple Sequence Alignments • Your next task will be to create multiple sequence alignments • This is one of the most challenging steps in the study of molecular evolution • Too often, a person will simply feed several sequences into an alignment program and accept whatever is produced • In MANY cases, the alignment will contain serious errors and may include sequences that are not homologs…..

  3. Overview of Multiple Alignments • If you ask an alignment program to align two sequences, it will • That does not mean the sequences are homologs, just that the algorithm has found an alignment which maximizes the number of identical and similar residues or nucleotides • If you are searching for similar domains or motifs, GREAT…the alignment program may have done a decent job • However, in many, many cases, the alignment program will only have created a starting point -- you will have to use your knowledge of the molecule or intuition or instincts to optimize the alignment

  4. Use Your Brain • Once you have an alignment, look at it • Do the gaps seem reasonable, or does it look like you could have done a better job by eye? • If yes, then you should try to do a better job • Are there large regions absent in one sequence? • If yes, you may need to delete those regions in the input file • Is one sequence much shorter? • If yes, you should crop the others or eliminate the short one from consideration

  5. More On Alignments • The ideal result is an alignment in which all bases aligned are homologs and all insertions and deletions represent real events in the evolution of the sequences • Since we cannot know, we settle for a reasonable approximation by assigning and adjusting gap penalties • With unlimited penalties, we could align any two unrelated sequences perfectly • Alignment programs prevent that by penalizing the alignment score for each gap and for each additional residue in a gap

  6. Gap penalties • Can we improve our alignment? • We begin by increasing or decreasing gap penalties and multiple alignment penalties in the program • Each time, we should print out the alignment and see how it compares to the previous one • Although time consuming, this is the single most important thing you can do to ensure you have the best possible alignment with that data set

  7. Advice on alignments • Treat them cautiously • Can usually be improved by eye • Often helps to have color coding • Depending on use, the user should be able to make a judgment on those regions that are reliable or not • For phylogenetic reconstruction, only use those positions whose hypothesis of positional homology is unimpeachable

  8. Exercise 1 • The first target gene is Efp • Go to the NCBI main page and type efp into the search engine and click go. • You will be taken to entrez and there will lots of information about the gene • Learn a bit about the gene and encoded protein • Try to find the gene in a genome from each of your two species • Try genome blast (found on the microbial genome resource page, in the tool box) • Do both species have the gene? • Is the gene similar in the two species? • Cut and paste the protein sequences into a word document

  9. Exercise 2 • The next exercise involves creating multiple alignments for this target protein • These alignments should be fairly simple, since the proteins should be either identical or very similar when compared from within a species

  10. Exercise 2, cont. • Start with Efp • You have already found this protein in the genome of each of your two species • Now, find the same protein in two more genomes for each species • Paste into the word file all three copies of the target protein from your first species • Repeat with a new word file and paste into the file all three copies of the target protein from your second species

  11. Exercise 2, cont. • Now edit each word file to look exactly like this (except use YOUR species designation, not E. coli’s): >Ecefp1 PROTEIN SEQUENCE ***IN ALL CAPS*** BLANK LINE >Ecefp2 PROTEIN SEQUENCE BLANK LINE >Ecefp3 PROTEIN SEQUENCE

  12. Sample input file >Ecefp1 YQHVKPGKGAAFVRAKIKSFLDGKVIEKTFHAGDKCEEPNLVEKTMQYLY HDGDTYQFMDIESYEQIALNDSQVGEASKWMLDGMQVQVLLHNDKAISVD VPQVVALKIVETAPNFKGDTSSASKKPATLETGTVV >Ecefp2 YQHVKPGKGAAFVRAKIKSFLDGKVIEKTFHAGDKCEEPNLVEKTMQYLY HDGDTYQFMDIESYEQIALNDSQVGEAKKWMLDGMQVQVLLHNDKAISVD VPQVVALKIVETAPNFKGDTSSASKKPATLETGAVV >Ecefp3 YQHVKPGKGAAFVRAKIKSFLDGKVIEKTFHAGDKCEEPNLVEKTMQYLY HDGDTYQFMDIESYEQIALNDSQVGEASKLMLDGMQVQVLLHNDKAISVD VPQVVALKIVETAPNFKGDTSSASKKPATLETGAVV

  13. Exercise 3 • Now go to the CLUSTALW web site: • http://www.ebi.ac.uk/Tools/clustalw2/index.html • This software provides a robust sequence alignment algorithm • At the top of the page, there is a box to insert your sequences • Simply cut and paste the three sequences from the first word document and click run

  14. Sample Clustal Output Results of search Number of sequences 3 Alignment score 2403 Sequence format pearson Sequence type aa Jalview tab: start Jalview Output file clustalw2-20080328-01104802.output Alignment file clustalw2-20080328-01104802.aln Guide tree file clustalw2-20080328-01104802.dnd Your input file clustalw2-20080328-01104802.input

  15. Sample Clustal Output SeqA Name Len(aa) SeqB Name Len(aa) Score ============================================================= • Ecefp1 136 2 Ecefp2 136 98 1 Ecefp1 136 3 Ecefp3 136 98 • Ecefp2 136 3 Ecefp3 136 98 ============================================================= Ecefp1 YQHVKPGKGAAFVRAKIKSFLDGKVIEKTFHAGDKCEEPNLVEKTMQYLYHDGDTYQFMD 60 Ecefp2 YQHVKPGKGAAFVRAKIKSFLDGKVIEKTFHAGDKCEEPNLVEKTMQYLYHDGDTYQFMD 60 Ecefp3 YQHVKPGKGAAFVRAKIKSFLDGKVIEKTFHAGDKCEEPNLVEKTMQYLYHDGDTYQFMD 60 ************************************************************ Ecefp1 IESYEQIALNDSQVGEASKWMLDGMQVQVLLHNDKAISVDVPQVVALKIVETAPNFKGDT 120 Ecefp2 IESYEQIALNDSQVGEASKLMLDGMQVQVLLHNDKAISVDVPQVVALKIVETAPNFKGDT 120 Ecefp3 IESYEQIALNDSQVGEAKKWMLDGMQVQVLLHNDKAISVDVPQVVALKIVETAPNFKGDT 120 *****************.* **************************************** Ecefp1 SSASKKPATLETGTVV 136 Ecefp2 SSASKKPATLETGAVV 136 Ecefp3 SSASKKPATLETGAVV 136 *************:**

  16. Sample CLUSTAL OutputPhylogenetic tree Ecefp1 Ecefp2 Ecefp3

  17. Exercise 4 • Gap Penalties and Extensions • CLUSTAL employs a set of default values for gap and extension penalties • What are these? • Your next task is to try several larger and smaller values for the gap and extension penalties. • Did your alignment change with changes in these penalties? • Decide what are the best values for each and write a paragraph in your lab notebook about how you decided what was best

  18. Exercise 5 • Repeat this exercise with each of the target proteins and with both of your species • Gene names: InfB, rpoC, accA, thyA, purA • Find the genes, paste the encoded protein sequence into your word files • If one or both of your species is missing any of these genes/proteins, here are some alternates: atpA, tpiA, pheS • Input/Output • A total of 12 alignment files will be made • Proteins 1 - 6 for species A • Proteins 1 - 6 for species B • A total of 12 alignments and 12 trees will be produced

  19. Exercise 6 • Think about for next lab how we will talk about the following: • The entire class will discuss the challenges they encountered with their multiple alignments • We need to reach a consensus on how we aligned these proteins in different species • Are some species more difficult than other? • Why might that be? • Are certain proteins not as useful in this exercise? • Why might that be? • Do we need to add or subtract any of our proteins?

More Related