1 / 36

Slowly approaching grass specific gene diversification OR We need to fix those phylogenies first!

Slowly approaching grass specific gene diversification OR We need to fix those phylogenies first! . gene family:. a set of divergent but functionally related genes that descend from the same ancestral gene . species A: 5 copies. species B: 15 copies. retention of duplicated gene copies.

kasa
Télécharger la présentation

Slowly approaching grass specific gene diversification OR We need to fix those phylogenies first!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Slowly approaching grass specific gene diversificationORWe need to fix those phylogenies first!

  2. gene family: a set of divergent but functionally related genes that descend from the same ancestral gene species A: 5 copies species B: 15 copies

  3. retention of duplicated gene copies mechanisms increasing gene copy number • tandem duplication • segmental duplication • whole genome duplication • large quantities of a gene product are needed • • specialization for functions, location, times

  4. lineage specific diversification species A: 5 copies species B: 15 copies species 3 has 5 gene copies species 3 has 5 gene copies

  5. lineage specific diversification species A: 5 copies species B: 15 copies species 3 has 5 gene copies species 3 has 5 gene copies

  6. NBS-LRR resistance gene families in Arabidopsis: ~150 - 200 gene copies in rice: ~500 - 700 gene copies CC coiled-coil domain NBS nuclear binding site domain LRR Leucine-rich repeats

  7. grasses are agronomically very important monocots dicots gymnosperms mosses, ferns

  8. Research objectives • I will search plant gene families for grass-specific expansions • I will identify those containing known resistance genes or their interacting partners • I will test for co-evolution of known resistance genes with their interacting partners • I will determine whether co-evolution with resistance genes is a new means to identify interacting partners of these genes

  9. Phytome protein-coding sequence data from 39 plant species 26,393 families with ≥ 2 members 307,492 singleton families related families multiple alignments motif and domain and subfamilies and phylogenies structure information

  10. identifying grass specific expansions • counting genes per taxon is not sufficient! • identify gene family phylogenies that contain many successive grass-specific internal nodes • identify duplication and speciation nodes for each gene family • label duplication nodes with grass-specific nodes

  11. identify successive grass-specific nodes in practice: a perl script • acesses the Phytome database • takes every tree stored in Phytome • and, comparing it to the species tree, labels its internal nodes according to the common ancestor of all descendant leaf nodes species tree gene tree

  12. identify duplication and speciation nodes speciation nodes: duplication nodes: (7) SDI: speciation duplication inference. Zmasek & Eddy 2001, Bioinformatics

  13. which grass-specific nodes are duplication nodes?

  14. required: labeled duplication/speciation nodes PROBLEM FOR SDI: UNRESOLVED GENE TREES!

  15. required: accurate gene phylogenies PROBLEM FOR DISTANCE METHODS: NO OVERLAP OF PARTIAL SEQUENCES!

  16. digressing from grass specific expansion: How can we generate phylogenies from these “partial sequence alignments” ?  required for grass specific expansion project  important for Phytome  necessary for anyone using EST data for phylogenetic analysis

  17. matrixA matrixB How can we generate correct phylogenies from “partial sequence alignments” ? can’t directly compute a single distance matrix with all sequences divide alignment into sub-sections, compute separate pairwise distance matrices: matrixA, matrixB 3. combine these to one single distance matrix, use it for phylogenetic reconstruction GOAL: define columns and sequences for sub-matrices

  18. The OverlapGraph • Sequence alignment 2. Overlap matrix seqAXXXXXXXXXXXXX seqB XXXXXXXXXXXXX seqC -------XXXXXX seqD ------XXXXXXX seqE XXXXXXX------ seqF XXXXXX------- 3. Overlap graph 4. Find largest cliques (complete subgraps)

  19. The OverlapGraph • Sequence alignment 2. Overlap matrix seqAXXXXXXXXXXXXX seqB XXXXXXXXXXXXX seqC -------XXXXXX seqD ------XXXXXXX seqE XXXXXXX------ seqF XXXXXX------- 3. Overlap graph 4. Find largest cliques (complete subgraps)

  20. The OverlapGraph • Sequence alignment 2. Overlap matrix seqAXXXXXXXXXXXXX seqB XXXXXXXXXXXXX seqC -------XXXXXX seqD ------XXXXXXX seqE XXXXXXX------ seqF XXXXXX------- 3. Overlap graph 4. Find largest cliques (complete subgraps)

  21. The OverlapGraph • Sequence alignment 2. Overlap matrix seqAXXXXXXXXXXXXX seqB XXXXXXXXXXXXX seqC -------XXXXXX seqD ------XXXXXXX seqE XXXXXXX------ seqF XXXXXX------- 3. Overlap graph 4. Find largest cliques (complete subgraps)

  22. problem: clique overlap alignment overlap graph

  23. problem: clique overlap Clique A: 1, 2, 3, 4, 5, 7 Clique B: 1, 3, 4, 5, 6, 7 Clique C: 4, 5, 8, 12, 13 Clique D: 4, 5, 8, 9, 10, 11, 12, 13 alignment overlap graph

  24. new strategy includes merging cliques 1. partial sequence alignment 2. generate OverlapGraph, find cliques 3. merge overlapping cliques 4. find connected components

  25. Validation How can we test whether this method will really generate the best phylogeny possible? – Use artificial data! ROSE - Random model Of Sequence Evolution(Stoye et al. 1998, Bioinformatics) input: • root sequence, • tree topology output: • a family of related sequences, created from the root sequence by insertion, deletion and substitution  sequences with a known evolutionary history • a correct multiple alignment of these sequences

  26. Validation • vary numbers of sequences per alignment (e.g., two alternatives: 10 and 50 sequences) • vary tree topologies (e.g., four alternatives: low resolution at deep nodes, high nodes, no low resolution, imbalanced tree) • vary alignment lengths (e.g., two alternatives: 50 and 200 aa) • vary average branch lengths/distances (two different mutation probabilities) • vary masks (e.g., five alternatives, based on deletion-patterns of Phytome families)

  27. Actual results soon to follow!  ?    ?  

  28. gene tree vs. species tree species A species B species C

  29. gene tree vs. species tree species A species B species C

  30. gene tree vs. species tree species A species B species C

  31. gene tree vs. species tree species A species B species C

  32. gene tree vs. species tree species A species B species C gene in species A gene in species B gene in species C A B C

  33. what if gap-boundaries aren’t so clear?what if some cliques are contained within others?

  34. problem: clique overlap

  35. grass specific diversification: patterns duplication event prior to diversification of the grass lineage duplication events after diversification of the grass lineage lineage specific diversification of an orthologous ancestor lineage specific genes

More Related