1 / 33

Comparative Genomics

Comparative Genomics. Overview. Orthologues and paralogues Protein families Genome-wide DNA alignments Syntenic blocks. Comparative Genomics. Allows us to achieve a greater understanding of vertebrate evolution

keefe
Télécharger la présentation

Comparative Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparative Genomics

  2. Overview • Orthologues and paralogues • Protein families • Genome-wide DNA alignments • Syntenic blocks

  3. Comparative Genomics • Allows us to achieve a greater understanding of vertebrate evolution • Tells us what is common and what is unique between different species at the genome level • The function of human genes and other regions may be revealed by studying their counterparts in lower organisms • Helps identify both coding and non-coding genes and regulatory elements

  4. MYBP 505 438 408 360 286 245 208 144 570 65 CAMBRI ORDO SIL DEV CARBON PER TRIA JURA CRETAC TERTIA Species in Ensembl PLACENTALS MAMMALS MONOTREMES MARSUPIALS OTHER BIRDS BIRDS PALEOGNATHS REPTILES PASSERINES CROCODILES TURTLES LIZARDS AMPHIBIANS TELEOSTS FISHES SHARKS RAYS LATIMERIA BICHIR/POLYPTERUS LUNGFISHES AGNATHANS NON-VERTEBRATES

  5. Orthologue / Paralogue Prediction Algorithm (1) Load the longest translation of each gene from all species used in Ensembl. (2) Run WUBLASTp+SmithWaterman of every gene against every other (both self and non-self species) in a genome-wise manner. (3) Build a graph of gene relations based on Best Reciprocal Hits (BRH) and Blast Score Ratio (BSR) values. (4) Extract the connected components (=single linkage clusters), each cluster representing a gene family. (5) For each cluster, build a multiple alignment based on the protein sequences using MUSCLE. (6) For each aligned cluster, build a phylogenetic tree using PHYML. An unrooted tree is obtained at this stage. (7) Reconcile each gene tree with the species tree to call duplication event on internal nodes and root the tree, using RAP. (8) From each gene tree, infer gene pairwise relations of orthology and paralogy types.

  6. Homologue Relationships • Orthologues : any gene pairwise relation where the ancestor node is a speciation event • Paralogues : any gene pairwise relation where the ancestor node is a duplication event

  7. Orthologue and Paralogue Types

  8. Orthologue and Paralogue types

  9. GeneView

  10. GeneView

  11. GeneTreeView MUSCLE protein alignment GeneTree

  12. GeneTreeView Speciation node (blue) Duplication node (red)

  13. Protein Dataset More than 1,500,000 proteins clustered: • All Ensembl protein predictions from all species supported ~ 670,000 protein predictions • All metazoan (animal) proteins in UniProt: ~ 80,000 UniProt/Swiss-Prot ~ 830,000 UniProt/TrEMBL

  14. Clustering Strategy • BLASTP all-versus-all comparison • Markov clustering • For each cluster: • Calculation of multiple sequence alignments with ClustalW • Assignment of a consensus description

  15. GeneView / TransView / ProtView Link to FamilyView

  16. FamilyView Consensus annotation JalView multiple alignments Ensembl family members within human UniProt family members Ensembl family members in other species

  17. JalView

  18. Whole Genome Alignments • Functional sequences evolve more slowly than non-functional sequences, therefore sequences that remain conserved may perform a biological function. • Comparing genomic sequences from species at different evolutionary distances allows us to identify: • Coding genes • Non-coding genes • Non-coding regulatory sequences

  19. Human vs.. Chimpanzee Mouse Opossum Pufferfish Size (Gbp) 3.0 2.5 4.2 0.4 Time since divergence ~5 MYA ~ 65 MYA ~150 MYA ~450 MYA Sequence conservation (in coding regions) >99% ~80% ~70-75% ~65% Aids identification of… Recently changed sequences and genomic rearrangements Both coding and non-coding sequences Both coding and non-coding sequences Primarily coding sequences Selection of Species for DNA comparisons

  20. Alignment Algorithm • Should find all highly similar regions between two sequences • Should allow for segments without similarity, rearrangements etc. • Issues • Heavy process • Scalability, as more and more genomes are sequenced • Time constraint

  21. BLASTZ-net, tBLAT and PECAN • BLASTZ-net (comparison on nucleotide level) is used for species that are evolutionary close, e.g. human - mouse • Translated BLAT (comparison on amino acid level) is used for evolutionary more distant species, e.g. human - zebrafish • PECAN is used for multispecies alignments • 7 eutherian mammals • 10 amniota vertebrates

  22. BLASTZ-net, tBLAT and PECAN For which combinations of species whole genome alignments have been done is shown on the Comparative Genomics page (Help & Documentation > Genomic Data > Comparative Genomics):

  23. ContigView Constrained elements Conservation score PECAN alignments Blastz mouse tBLAT zebrafish

  24. MultiContigView Conserved sequences human Conserved sequences dog

  25. AlignSliceView Human Mouse Dog Rat

  26. MultiContigView vs. AlignSliceView

  27. AlignView

  28. GeneSeqalignView

  29. GeneSeqalignView

  30. Syntenic Blocks • Genome alignments are refined into larger syntenic regions • Alignments are clustered together when the relative distance between them is less than 100 kb and order and orientation are consistent • Any clusters less than 100 kb are discarded

  31. SyntenyView Human chromosome Orthologues Mouse chromosomes Mouse chromosomes

  32. CytoView Syntenic blocks Orientation Chromosome

  33. Q & A Q U E S T I O N S A N S W E R S

More Related