1 / 34

Data Mining in Ensembl with BioMart

Data Mining in Ensembl with BioMart. www.ensembl.org/biomart/martview www.biomart.org/biomart/martview. Nov, 2009. BioMart- Data mining. BioMart is a search engine that can find multiple terms and put them into a table format. Such as: mouse gene (IDs), chromosome and base pair position

joshwa
Télécharger la présentation

Data Mining in Ensembl with BioMart

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining in Ensembl with BioMart www.ensembl.org/biomart/martview www.biomart.org/biomart/martview Nov, 2009

  2. BioMart- Data mining • BioMart is a search engine that can find multiple terms and put them into a table format. • Such as: mouse gene (IDs), chromosome and base pair position • No programming required!

  3. General or Specific Data-Tables • All the genes for one species • Or… only genes on one specific region of a chromosome • Or… genes on one region of a chromosome associated with an InterPro domain

  4. The First Step: Choose the Dataset Dataset: Current Ensembl, Human genes

  5. The Second Step: Filters Filters: Define a gene set

  6. Attributes attach information Attributes: Determine output columns

  7. Results Tables or sequences

  8. Query: • For the human CFTR gene, can I export the EntrezGene ID, and also, probes with this gene sequence from the “Affy HG U133 Plus 2” microarray platform? • In the query: Filters: what we know Attributes: what we want to know.

  9. Query: • For the human CFTR gene, can I export the EntrezGene ID, and also, probes with this gene sequence from the “Affy HG U133 Plus 2” microarray platform? • In the query: Filters: what we know Attributes: what we want to know.

  10. Query: • For the human CFTR gene, can I export the EntrezGene ID, and also, probes with this gene sequence from the “Affy HG U133 Plus 2” microarray platform? • In the query: Filters: what we know Attributes: what we want to know (columns in the result table)

  11. A Brief Example Use the current Ensembl (archives are also available) Select Homo sapiens

  12. Select the genes with Filters Expand the ‘REGION’ panel. Click Filters Expand the GENE panel to enter in the gene ID(s).

  13. Filters Change this to HGNC symbol. Enter “CFTR” in the box. Click “Count” to see if genes passed through your filters.

  14. Attributes (Output Options) Expand the “GENE” section. Click on ‘Attributes’

  15. Attributes (Output Options) Select “Description” and “Associated Gene Name”. Expand the ‘EXTERNAL’ panel for non-Ensembl IDs.

  16. Attributes (Output) …………………………………………………………………. External IDs include EntrezGene IDs and also Microarray probe IDs.

  17. The Results Table - Preview For the full result table: click “Go” or View “ALL” rows. “Results” show Description, Name, EntrezGene and Probe matches from the Affy HG U133-Plus-2 platform.

  18. Full Result Table Affy HG probe Gene Name EntrezGene ID Ensembl Gene and Transcript IDs Description

  19. Other Export Options (Attributes) • Sequences: UTRs, flanking sequences, cDNA and peptides, etc • Gene IDs from Ensembl and external sources (MGI, Entrez, etc) • Microarray data • Protein Functions/descriptions (Interpro, GO) • Orthologous gene sets • SNP/ Variation Data

  20. BioMart Data Sets • Ensembl genes • Vega genes • Variations

  21. BioMart around the world… BioMart started at Ensembl…To where has it travelled?

  22. Central Portal www.biomart.org

  23. WormBase

  24. Population frequencies Inter- population comparisons Gene annotation HapMap

  25. DictyBase

  26. GRAMENE www.gramene.org

  27. The Potato Center

  28. How to Get There http://www.biomart.org/biomart/martview http://www.ensembl.org/biomart/martview • Or click on ‘BioMart’ from Ensembl

  29. The Flow • Choose Dataset (All genes for a species) • Choose Filters (narrows the gene set) • Choose Attributes (output options) Now Try the Worked Example on Page 23!

  30. Ensembl Core Databases Relational Database Normalised Each data point stored only once Therefore: Quick updates Minimal storage requirements But: Many tables Many joins for complicated queries Slow for data mining applications

  31. Normalised Schema

  32. BioMart Database Data warehouse De-normalised Query-optimised Therefore: Fast and flexible Ideal for data mining But: Tables with apparent “redundancy” Needs rebuilding from scratch for every release from normalised core databases

  33. De-Normalised Schema

  34. Information Flow REGION REGION GENE GENE EXPRESSION EXPRESSION HOMOLOGY HOMOLOGY PROTEIN PROTEIN SNP SNP SPECIES FOCUS SWISSPROT FASTA EMBL GTF REFSEQ HTML GO TEXT INTERPRO EXCEL AFFYMETRIX FILE DATASET FILTER ATTRIBUTES

More Related