1 / 38

Remarks About Homework

Remarks About Homework. Write detailed answers Pay attention to details in the questions “… nor can the shy man learn…”. Multiple Sequence Alignment (MSA) and Phylogeny. One of the options to get multiple sequence Fasta file. One of the options to get multiple sequence Fasta file.

Télécharger la présentation

Remarks About Homework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Remarks About Homework • Write detailed answers • Pay attention to details in the questions • “… nor can the shy man learn…”

  2. Multiple Sequence Alignment (MSA)andPhylogeny

  3. One of the options to get multiple sequence Fasta file

  4. One of the options to get multiple sequence Fasta file

  5. MSA input: multiple sequence Fasta file >gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens] MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI >gi|57113961|ref|NP_001009043.1| CD4 antigen [Pan troglodytes] MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI >gi|50054438|ref|NP_001001908.1| CD4 antigen [Sus scrofa] MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT >gi|6978631|ref|NP_036837.1| Cd4 molecule [Rattus norvegicus] MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI

  6. Clustal X

  7. Step1: Load the sequences

  8. Uploaded sequences A little unclear…

  9. Edit Fasta headers… >Homo_sapiens_CD4 <gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens] MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLT KGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQTKILGNQGSFLT KGPSKLNDRVDSRRSLWDQGNFTLIIKNLKIEDSDTYICEVGDQKEEVQLLVFGLTANSDTHLLQGQSLT LTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSI VYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPL HLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAK VSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCV RCRHRRRQAQRMSQIKRLLSEKKTCQCPHRFQKTCSPI MDPGTSLRHLFLVLQLAMLPAASGTQEKYLVLGKAGDLAELPCHSSQKKNLPFNWKNSNQTKILGGHGSF WHTASVTELTSRLDSKKNMWDHGSFPLIIKNLEVTDSGIYICEVEDKRIEVQLLVFRLTASVTRVLLGQS LTLTLEGPSGSHPTVQWKGPGNKSKNDVKSLLLPQVGLEDSGLWTCTVSQDQKTLVFRSNIFVLAFQKVP STVYVKEGDQVALSFPLTFEAESLSGELMWRQTKGASSPQSWITFSLKDRKVTVQKSLQNLKLRMAEKLP LQITLLQALPQYAGSGNLTLVLPEGRLHREVNLVVMRATQSKNEVTCEVLGPTPPKVVLSLKLGNQSMKV SDQQKLVTVLDPEAGMWRCLLRDKDKVLLESQVEVLPTAFTRAWPELLASVIGGIIGLLFLAGFCIACVK CWHRRRRAERMSQIKRLLSEKKTCQCAHRQQKNYSLT MCRGFSFRHLLPLLLLQLSKLLVVTQGKTVVLGKEGGSAELPCESTSRRSASFAWKSSDQKTILGYKNKL LIKGSLELYSRFDSRKNAWERGSFPLIINKLRMEDSQTYVCELENKKEEVELWVFRVTFNPGTRLLQGQS LTLILDSNPKVSDPPIECKHKSSNIVKDSKAFSTHSLRIQDSGIWNCTVTLNQKKHSFDMKLSVLGFAST SITAYKSEGESAEFSFPLNLGEESLQGELRWKAEKAPSSQSWITFSLKNQKVSVQKSTSNPKFQLSETLP LTLQIPQVSLQFAGSGNLTLTLDRGILYQEVNLVVMKVTQPDSNTLTCEVMGPTSPKMRLILKQENQEAR VSRQEKVIQVQAPEAGVWQCLLSEGEEVKMDSKIQVLSKGLNQTMFLAVVLGSAFSFLVFTGLCILFCVR CRHQQRQAARMSQIKRLLSEKKTCQCSHRMQKSHNLI >Pan_troglodytes_CD4 >gi|57113961|ref|NP_001009043.1| CD4 antigen [Pan troglodytes] >Sus_scrofa_CD4 >gi|50054438|ref|NP_001001908.1| CD4 antigen [Sus scrofa] >gi|6978631|ref|NP_036837.1| Cd4 molecule [Rattus norvegicus] >Rattus_norvegicus_CD4

  10. Uploaded sequences Much better

  11. Step2: Perform alignment

  12. Multiple Sequence Alignment and conservation view

  13. Step 3: Create tree

  14. The Newick tree format is used to represent trees as strings A C B D In Newick format: ((A,C),(B,D)); • Each pair of parenthesis () encloses a clade in the tree • A comma “,” separates the members of the corresponding clade • A semicolon “;” is always the last character

  15. Step 4: View tree with NJPlot Note: unrooted tree

  16. C = = A B C A B C C = A B B A

  17. 3 1 A B C C B A 2 A C B Rooted vs. unrooted trees ≠ 3 A 1 ≠ C B 2

  18. 3 1 A B C C B A ((A,B),C) ((C,B),A) 2 A C B (A,B,C) ((A,C),B) How would each tree look in Newick format? ≠ 3 A 1 ≠ C B 2

  19. Step 4.5: defining an outgroup

  20. Step 4: View tree with NJPlot Note: The order inside a split doesn’t matter

  21. Gorilla Human Chimp Human Chimp Gorilla = Gorilla Chimp Human (Gorilla,(Human,Chimp)) = (Gorilla,(Chimp,Human)) = = Chimp Human Gorilla = ((Human,Chimp),Gorilla) = ((Chimp,Human),Gorilla)

  22. How robust is our tree?

  23. How robust is our tree? • We need some statistical way to estimate the confidence in the tree topology (like we need the E-value to estimate the confidence of a blast hit) • But we don’t know anything about the distribution of tree topologies • The only data source we have is our data (MSA) • So, we must rely on our own resources: “pull up by your own bootstraps”

  24. Bootstrap

  25. Bootstrap 1. Create n (100-1000) new MSAs (pseudo-datasets) by randomly sampling K positions from our original MSA with replacement 12345K 1 : ATCTG…A 2 : ATCTG…C 3 : ACTTA…C 4 : ACCTA…T 11244…3 1 : AATTT…C 2 : AATTT…C 3 : AACTT…T 4 : AACTT…C 97478…10 1 : TTTTA…T 2 : CATAC…A 3 : CATAC…T 4 : AGTGG…A 51578… 12 1 : GAGTA…T 2 : GAGAC…G 3 : AAAAC…A 4 : AAAGG…C

  26. Sp1 Sp2 Sp3 Sp4 Bootstrap 2. Reconstruct a pseudo-tree from each pseudo-dataset using the same method used for reconstructing the original tree 11244…3 1 : AATTT…C 2 : AATTT…C 3 : AACTT…T 4 : AACTT…C 97478…10 1 : TTTTA…T 2 : CATAC…A 3 : CATAC…T 4 : AGTGG…A 51578… 12 1 : GAGTA…T 2 : GAGAC…G 3 : AAAAC…A 4 : AAAGG…C Sp1 Sp1 Sp2 Sp2 Sp3 Sp3 Sp4 Sp4

  27. Sp1 Sp2 Sp3 Sp4 Bootstrap 3. For each node in our original tree, we count the number of times it appeared in the pseudo-trees Sp1 Sp1 Sp2 Sp2 Sp3 Sp3 Sp4 Sp4 67% Sp1 100% Sp2 Sp3 Sp4

  28. Step 3.5 - Bootstrap

  29. Bootstrap values on NJPlot Note:ClustalX saves trees with .ph extension. Trees with bootstrap are saved with .phb extension

  30. Reconstructing the tree of life

  31. Darwin’s vision of the tree of life from the Origin of Species

  32. Based on molecular data (SSU rRNA), branching of several kingdoms remain in dispute

  33. Lateral Gene Transfer (LGT) Challenges the Conceptual Basis of Phylogenetic Classification

  34. Toward Automatic Reconstruction of a Highly Resolved Tree of Life Science 3 March 2006:Vol. 311. no. 5765, pp. 1283 - 1287

  35. Methodology • Started with 36 genes universally present in 191 species (spanning all 3 domains of life), for which orthologs could be unambiguously identified • Eliminated 5 genes that are LGT suspects (mostly tRNA synthetases) • Constructed an MSA for each of the 31 orthogroups • Concatenated all 31 MSAs to a super-MSA of 8090 columns • The phylogeny was reconstructed based on the super-MSA using the maximum likelihood approach

  36. Archaea Eukaryota Bacteria

  37. Tree support • 81.7% of the branches show bootstrap support of over 80% • 65% of the branches show bootstrap support of 100% • However, several deep branchings show low supports

More Related