1 / 30

Comparative Genomics

Comparative Genomics. Overview of the Talk. Comparing Genomes Homologies & Families Sequence Alignments. Evolution at the DNA Level. Deletion. Mutation. …AC TGA CATG T ACCA…. Sequence edits. …AC ---- CATG C ACCA…. Rearrangements. Inversion. Translocation. Duplication.

Télécharger la présentation

Comparative Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparative Genomics

  2. Overview of the Talk • Comparing Genomes • Homologies & Families • Sequence Alignments

  3. Evolution at the DNA Level Deletion Mutation …ACTGACATGTACCA… Sequence edits …AC----CATGCACCA… Rearrangements Inversion Translocation Duplication

  4. Why Compare Genomes? • We can better understand evolution/ speciation • We can find important, functional regions of the sequence (codons, promoters, regulatory regions) • It can help us locate genes in other species that are missing or not well-defined (also through comparison and alignments).

  5. Comparing Genomes • Mammals have roughly 3 billion base pairs in their genomes • Over 98% human genes are shared with primates, wth more than 95-98% similarity between genes. • Even the fruit fly shares 60% of its genes with humans! (March 2000) • Differences: gene structure, sequence Remember… one nucleotide change can cause disease such as sickle cell anemia and cancer.

  6. How Does Ensembl Predict Homology? • Uses all the species • Uses a representative protein (the longest) for every gene • Builds a gene tree • EnsemblCompara GeneTrees: Analysis of complete, duplication aware phylogenetic trees in vertebrates. Vilella AJ, Severin J, Ureta-Vidal A, Durbin R, Heng L, Birney E. Genome Res. 2008 Nov 24.

  7. Steps in Homology Prediction ..MEDPATA… WU Blastp + SmithWaterman longest translation of every gene against every other (Blast Reciprocal Hit/ Blast Score Ratio) Load longest protein for every gene from all species Protein clustering, build multiple alignments (MCoffee) From each alignment, build a gene tree Reconcile each gene tree with the species tree to determine internal nodes (TreeBest) Orthologues, paralogues…

  8. Viewing Trees in Ensembl

  9. Types of Homologues • Orthologues : any gene pairwise relation where the ancestor node is a speciation event • Paralogues : any gene pairwise relation where the ancestor node is a duplication event

  10. The Gene Tree for INS (insulin precursor) A blue square is a speciation event (Orthologues) A red square is a duplication event (Paralogues)

  11. M R H Duplication node Speciation node R’ H’ M’ gene loss M H R gene loss gene loss Reconciliation M R H species tree M H R unrooted gene tree

  12. Orthologue Types What is ‘1 to 1’? What is ‘1 to many’?

  13. Protein Families • How: Cluster proteins for every isoform in every species + UniProt proteins. • BLASTP comparison of: • all Ensembl ENSP… • all metazoan (animal) proteins in UniProt

  14. Homologues Exercise • Find the human MYL6 gene: go to its gene summary. • How many paralogues does it have? Find them in the gene tree. • Which paralogue is closest to the human MYL6 gene? In what taxon is the common ancestor?

  15. Pan-Compara (Ensembl Genomes) x2 x8 Plasmodium falciparum Plasmodium vivax Bacillus subtilis Escherichia coli K12 Mycobacterium tuberculosis H37Rv Neisseria meningitidis A 4A Pyrococcus horikoshii Staphylococcus aureus N315 Streptococcus pneumoniae TIGR4 Streptococcus pyogenes M1 SF370 x2 Saccharomyces cerevisiae Schizosaccharomyces pombe x3 x13 Arabidopsis thaliana Oryza sativa japonica Vitis vinifera Anolis carolinensis Ciona savignyi Danio rerio Equus caballus Gallus gallus Homo sapiens Macaca mulatta Monodelphis domestica Mus musculus Ornithorhynchus anatinus Pan troglodytes Pongo pygmaeus Xenopus tropicalis x3 Anopheles gambiae Caenorhabditis elegans Drosophila melanogaster

  16. www.ensemblgenomes.org

  17. Families

  18. Ensembl Proteins in the Family

  19. Overview of the Talk • Comparing Genomes • Homologies and Families • Sequence Alignments

  20. Non-Coding Regions • Large stretches of non-coding regions in vertebrates • Regulatory regions of: Developmental genes Transcription factors miRNA Kikuta et. al, Genome Research, May 2007

  21. Comparative Genomics today

  22. Aligning Whole Genomes- Why? • To identify homologous regions • To spot trouble gene predictions • Conserved regions could be functional • To define syntenic regions (long regions of DNA sequences where order and orientation is highly conserved)

  23. Aligning large genomic sequences Difficulties: • Requires a significant computer resource • Scalability, as more and more genomes are sequenced • Time constraint • As the «true» alignment is not known, then difficult to measure the alignment accuracy and apply the right method

  24. Whole Genome Alignments • BLASTZ-net (nucleotide level) closer species e.g. human – mouse • Translated BLAT (amino acid level) more distant species, e.g. human – zebrafish • EPO/PECAN multispecies alignments • ORTHEUS used to determine ancestral alleles

  25. Alignments Exercise • Find the Ensembl MYH2 gene for human and go to Region in Detail. • Turn on the BLASTZ alignment against cow. What part of the cow genome aligns to this region in human? • Jump to the region in cow.

  26. AlignmentsExercise Go back to the human page. • Use the Alignments (text) and Multi-species view links to explore the alignments.

  27. Conserved Regions Exercise Go back to region in detail • Turn on the conservation score for 31 species, and the constrained elements tracks. • Where are the regions of high conservation? • Click on the regulatory feature that corresponds to a highly conserved block of sequence. What is it?

  28. Ancestral Alleles • Go to the variation tab for rs34161789, and take the Phylogenetic Context link. • What is the allele in the four primates? Hint… either go to the gene tab and click on the SNP ID from the variation table, or do a new search using rs34161789.

  29. Compara Team at EBI • Javier Herrero • Kathryn Beal • Stephen Fitzgerald • Albert Vilella

  30. End of Course Survey Exercises on page 43. Answers are on page 44.

More Related