1 / 46

How to access genomic information using Ensembl

How to access genomic information using Ensembl. Damian Smedley and Xos é Fern á ndez Ensembl Project European Bioinformatics Institute Cambridge, UK. November 2004. Schedule. Today Introduction to the Ensembl system Hands-on examples to introduce the system

zalman
Télécharger la présentation

How to access genomic information using Ensembl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November 2004

  2. Schedule Today Introduction to the Ensembl system Hands-on examples to introduce the system Evaluating genes and transcripts Variation in Ensembl (SNPs, haplotypes) Tomorrow Data mining with EnsMart Comparative genomics and proteomics in Ensembl BioMart Advanced topics (Upload your own data, DAS)

  3. Our goal

  4. From 325,109 initial contigs to 26,720 overlapping clones Assembly Other ordering data non-redundant, “virtual contig” view

  5. BACs bacterial artificial chromosomes avg size 150 kb fragment Shizuya et al 1992 Dib et al 1996 Deloukas et al 1998 Osoegawa et al 2001 map WGS sequence assembly fragment Bentley et al 2001 Bruls et al 2001 McPherson et al 2001 Montgomery et al 2001 Tilford et al 2001 draft pUCs avg size 2-4 kb finished BAC Mapping and Sequencing the human genome

  6. Status of the human sequence finished red /orange ~96% (99.999% accurate) 30-40% repetitive elements (eg Alpha satellite, Alu repeats) All known genes, correctly identified (99.74%) heterochromatin ~4% grey Assembled draft sequence totals 2.85 Gb

  7. Human genome: Current status • 22,287 'gene loci‘ defined, consisting of 19,599 protein-coding genes in the human genome and 2,188 DNA additional segments ‘predicted’ to be protein-coding genes • 1183 genes ‘were born’ in the last 60-100 My • ~ 30 genes ‘died’ in a similar time period Finishing the euchromatic sequence of the human genome, Nature 431:931-45 (2004)

  8. Ensembl - project aims • funded to provide metazoan genomes to the world • aims to provide the world’s best automated genome annotation • a leading group for human and mouse analysis • all software, data and results freely available

  9. Ensembl - project background • group split between EBI and Sanger • mainly Wellcome Trust funded • largest dedicated compute in biology in Europe • developer community > 100 people, including companies

  10. Ensembl – Open source • Freely-available • Community development. • >51 Ensembl installs worldwide. • Both public and commercial, • e.g. Gramene (CSHL) • Fugu-sg (ICMB) • Ciona-sg (Temasek)

  11. SNP Manual Annotation Ensembl Supporting Databases Final DB Analysis DB CPU

  12. Genome browsingwhy present the whole genome? • Explore what is in a chromosome region • See features in and around a specific gene • Search & retrieve across the whole genome • Investigate genome organization • Compare to other genomes

  13. http://www.ensembl.org http://www.ncbi.nlm.nih.gov/mapview http://genome.ucsc.edu Genome browsers • Ensembl – public site + installable system • UCSC Human Genome Browser • NCBI Map Viewer

  14. Introduction to the Ensembl web site Ensembl … … takes genomic sequence assemblies human build 34, mouse, rat, Fugu,mosquito adds annotation and links automated process presents all the data on a web site

  15. Annotation: genes Known genes Novelgenes • how to predict? • require evidence • transcripts(s)? • protein(s)? • orthologues? • attach useful links • where? • genomic structure? • transcripts(s)? • protein(s)? • orthologues? • attach useful links

  16. Annotation: other features • markers and SNPs • cytogenetic bands • repeated sequences • ESTs & other sequence records where do they show sequence similarity? • regions homologous to other species

  17. Species homepage Site map Map View Text search BLAST SSAHA Disease View How to get started … …

  18. Homepage

  19. Site map

  20. AnchorView MapView

  21. BLAST and SSAHA

  22. BLAST and SSAHA

  23. Regions, maps and markers ContigView CytoView SyntenyView MultiContigView MarkerView SNPView

  24. EnsemblContigView

  25. ContigViewclose-up Customising & short cuts Transcripts red & black (Ensembl predictions) Blue (Vega) Evidence Pop-up menu

  26. ContigView - Chromosome 20 close-up Manual annotation via Vega Forward strand Ensembl predictions Reverse strand Ensembl EST-based predictions Other chromosomes with manual annotation from http://vega.sanger.ac.uk:6, 7, 9, 10, 13, 14, 20, 22, X

  27. CytoView

  28. GeneSNPView

  29. MarkerView SNPView

  30. SyntenyView

  31. MultiContigView

  32. Genes & gene products GeneView TransView ExonView ProteinView FamilyView DomainView GOView DiseaseView

  33. EnsemblGeneView

  34. ExonView TransView

  35. ProteinView

  36. FamilyView

  37. GOView

  38. DiseaseView

  39. Data retrieval EnsMart Export View Data sets on ftp site MySQL queries of databases Perl API access to databases

  40. ExportView

  41. EnsMart

  42. Genomic sequence assembly based on whole genome shotgun, with finished ‘stitched’ BACs BACs are shown in CytoView (FPC map), but for most no sequence is available Mouse differences

  43. MouseCytoView

  44. context sensitive help pages - click access other documentation via generic home page email the helpdesk Help! HelpDesk / Suggestions

  45. Thanks Ensembl Team

  46. Ensembl Team November 2004

More Related