1 / 42

Applied Bioinformatics

Applied Bioinformatics. Week 7 Jens Allmer. Practice I. Homework Feedback. Review Rough Writing Guidelines Word Template. Topic. Multiple Sequence Alignment Review Building an MSA Editing an MSA Dendrograms Phylogenetic Trees. Choosing Sequences. How many?

landry
Télécharger la présentation

Applied Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applied Bioinformatics Week 7 Jens Allmer

  2. Practice I

  3. Homework Feedback • Review Rough Writing Guidelines • Word Template

  4. Topic • Multiple Sequence Alignment Review • Building an MSA • Editing an MSA • Dendrograms • Phylogenetic Trees

  5. Choosing Sequences • How many? • 10 – 15 (less than 50 would be good) • Seqs should be >30% and <90% identical • Prefer seqs of similar length • Prefer seqs without internal repeats or extract them

  6. Choosing Sequences • While choosing your sequences give them good names • Some sequences should be well annotated

  7. Create an MSA • This time use 20 – 50 sequences • From different species • Use ClustalW for alignment • Most ClustalW servers display a dendrogram • Confirm this by using a few of them

  8. Gathering Sequences • Download the sequences as a FASTA file as well • Most programs will support this format

  9. Output Formats • Many different formats • FASTA widely supported • Pdf Only for printing/ storing/ sharing • Pir Similar to fasta • Msf common MSA format • Aln subset of msf

  10. Converting Formats • http://bioweb.pasteur.fr/seqanal/interfaces/fmtseq.html • Names (>…) no longer than 15 characters • Different formats maintain different data • Converting will introduce the problem of loosing data • Make sure to have a master copy

  11. Editing Alignments • http://www.jalview.org • Start the program • Choose File – Input Alignment – from Textbox • Copy and paste the ClustalW alignment

  12. Dendrogram • Jalview also allows you to view different types of Dendrograms based on different similarity measures • Use Jalview and compare the trees that are constructed based on the different measures

  13. End Practice I • 15 min break

  14. Theory I

  15. Phylogeny • Sources • Sequences • Clades • Organims • Why • Understand evolution • Strain diversity • Epidemiology • Gene predicion

  16. Dendrogram http://en.wikipedia.org/wiki/Dendrogram

  17. Phylogenetic Tree

  18. Tree Terminology • All circled elements (e.g.: a)are called node(s) • The connections between them are called edge(s) or branch(es) • The first node that forms the tree is called root (here abcdef) • Terminal nodes that have only one connection are called leaf(ves) (e.g.: a) Unrooted Trees (remove red root)

  19. Branch Length • Arbitrary • Similarity • Evolutionary Time

  20. Tree types • A dendrogram is a broad term for the diagrammatic representation of a phylogenetic tree. • A cladogram is a tree formed using cladistic methods. This type of tree only represents a branching pattern, i.e., its branch lengths do not represent time. • A phylogram is a phylogenetic tree that explicitly represents number of character changes through its branch lengths. • A chronogram is a phylogenetic tree that explicitly represents evolutionary time through its branch lengths.

  21. Sequences • DNA • Sensitive but quite divergent at longer distances • Use for very closely related organisms • cDNA • Still sensitve but less divergent (e.g. introns) • Use for closely related families • Protein • Least sensitive but most useful for more distant relationships • Use for distantly related species • 16S RNA • Exists in all organisms • Highly conserved

  22. Overall Process • Get Sequences • Construct MSA • Compute pairwise distances (for some methods) • Build Tree • Topology • Branch Lengths • Estimate accuracy, reliability • Build several different trees for that • Visualize the tree

  23. Computational Tree Formation • Distance Methods • Neighbor-Joining • Least-Squares • UPGMA • Parsimony • Least number of evolutionary steps • Maximum Likelihood • Highest probable tree to fit to the hypothesis is constructed

  24. Neighbor Joining • Bottom-up clustering method • Create distance map • Join closest nodes • Do (1-2) until fully joined http://en.wikipedia.org/wiki/Neighbor_joining

  25. Least Squares • Standard approximation approach • Minimizes the sum of the error (squares) • Example PGLS • Phylogenetic Generalized Least Squares • Needs additional data (traits) http://www.dynamicgeometry.com/General_Resources/Advanced_Sketch_Gallery/Other_Explorations/Statistics_Collection/Least_Squares.html

  26. UPGMA • Unweighted Pair Group Method with Arithmetic Mean • Aglomerative hierarchial clustering method • Assumes constant rate of evolution

  27. Similarity Measures • Sequence • Number of different positions • Weighted differences • Substitution Matrices • Pairwise alignments • NW, SW, .. • Additional measurements or knowlege • Traits • Parsimony • Number of changes for tree paths

  28. Tree Accuracy • Bootstrapping • Resample • Recompute • Do many times • Compare results http://www.sciencedirect.com/science/article/pii/S0191814107000156

  29. http://goergen.deviantart.com/art/Magic-Forrest-Wallpaper-139108299http://goergen.deviantart.com/art/Magic-Forrest-Wallpaper-139108299

  30. End Theory I • Relax • Mindmap • Break

  31. Practice II

  32. Where to get Trees • Most servers that allow for MSA will also provide at least the guide tree which was used to construct the alignment • If that’s all you are interested in you don’t need to go any further

  33. Edit your MSA • Remove blocks consisting of mostly gaps (using JalView) • Remove N- and C-termini if not conserved well

  34. Easy Tree • www.ebi.ac.uk/clustalw/ • Paste your alignment • Select a tree type • Other options need to be set (see right) • Press run • Make a screen shot • You can paste it where needed

  35. Phylip (More elaborate tree) • http://bioweb.pasteur.fr/seqanal/phylogeny/phylip-uk.html • Choose protdist from the page • Paste the MSA • Bootstrapping e.g.:

  36. Phylip • Run the query • Click further analysis

  37. Click Run Select full screen view There is your tree

  38. Ugly Tree • Let’s face it the tree is quite ugly • http://iubio.bio.indiana.edu/treeapp/treeprint-form.html • Select the consense.outtree from the previous website and paste it into the box • Select submit to create the tree • Play around with the formats and settings

  39. Tree Topologies

  40. Other Resources • http://en.wikipedia.org/wiki/List_of_phylogenetics_software • http://itol.embl.de/

More Related