1 / 40

Managing Data Modeling

Managing Data Modeling. GO Workshop 3-6 August 2010. Managing Data. Functional modeling strategy Converting between Database IDs Ensembl Biomart UniProt DAVID AgBase ArrayIDer Arrays examples to work on. Types of data sets and modeling.

cisco
Télécharger la présentation

Managing Data Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Managing Data Modeling GO Workshop 3-6 August 2010

  2. Managing Data • Functional modeling strategy • Converting between Database IDs • Ensembl Biomart • UniProt • DAVID • AgBase ArrayIDer • Arrays • examples to work on

  3. Types of data sets and modeling • Commercial array data – more likely to have ID mapping to support functional modeling. • Custom/USDA array data – may need to do your own ID mapping: see examples on workshop page. • Proteomics data • RNA-Seq data sets – computational pipelines to assign GO (GOanna is limited; contact AgBase). • Real-time data or quantitative proteomics data – hypothesis testing.

  4. Overview of Functional Modeling Strategy Microarray Ids GOModeler hypothesis testing Pathways and network analysis Ingenuity Pathways Analysis (IPA) Pathway Studio Cytoscape DAVID ArrayIDer Protein/Gene identifiers GO Enrichment analysis Ingenuity Pathways Analysis (IPA) Pathway Studio Cytoscape DAVID EasyGO/AgriGO Onto-Express Onto-Express-to-go (OE2GO) GORetriever Genes/Proteins with no GO annotations GO annotations summarizes GO function GOSlimViewer GOanna Yellow boxes represent AgBase tools Green/Purple boxes are non-AgBase resources

  5. Functional Modeling Considerations • Should I add my own GO? • use GOSlimViewer to see how much GO is available for your species • use GORetriever to see how much GO is available for your dataset • Should I do GO analysis and pathway analysis and network analysis? • different functional modeling methods show different aspects about your data (complementary) • is this type of data available for your species (or a close ortholog)? • What tools should I use? • which tools have data for your species of interest? • what type of accessions are accepted? • availability (commercial and freely available)

  6. structurally and functionally re-annotated a microarray • quantified the impact of this re-annotation based on GO annotations & pathways represented on the array • tested using a previously published experiment that used this microarray • re-annotation allows more comprehensive GO based modeling and improves pathway coverage • re-annotation resulted in a different model from previously published research findings

  7. Converting accessions • Depending on your data set & the tools you use, you are likely to need to convert between database accessions to do your functional modeling. • UniProt database – ID mapping tab • Ensembl BioMart • Online analysis tools: • DAVID • g:profiler • GORetriever • ArrayIDer – converts EST accessions for some species (by request)

  8. Commercial arrays Custom arrays EST arrays Proteomics RNA-Seq data Commercial ID mapping eg. NetAffy Ensembl BioMart Online tools (g:convert, DAVID) ArrayIDer UniProt ID Conversion ID Mapping

  9. Working on your own data: • New to GO • GO browser tutorials to familiarize yourself with the GO • learn what GO is available for your species • Your own data set • functional grouping to get overview (eg. GOSlimViewer • GO enrichment analysis (tools available for your species) • Pathway analysis • Example data sets available – use as worked examples

  10. Working on your own data: • New to GO • GO browser tutorials to familiarize yourself with the GO • learn what GO is available for your species • Your own data set • functional grouping to get overview (eg. GOSlimViewer • GO enrichment analysis (tools available for your species) • Pathway analysis • Example data sets available – use as worked examples Most of these tools (including Pathways Analysis) accept only certain database accessions  need to convert accessions between databases

  11. Example: ID conversion • Ensembl Plant Biomart tool • currently limited species, but Ensembl is adding more plants • BioMart allows sophisticated querying of genomic data • DAVID ID conversion tool • allows users to convert IDs and do GO enrichment analysis • UniProt ID conversion • highly annotated data • ArrayIDer • links ESTs to public database IDs

  12. http://plants.ensembl.org/index.html NOTE: Ensembl is adding new plant species…

  13. 1. Ensembl BioMart

  14. Clicking on these headings allows you to set up searches. Selecting FILTERS gives you different filtering options:

  15. Expand GENE and check “ID list limit” to select a defined list of accessions. Enter your list of accessions.

  16. Selecting ATTRIBUTES allows you to choose what information is reported: Check accessions from external databases (UniProt & RefSeq).

  17. Clicking on RESULTS will show you the output information. • Output can be displayed online and/or downloaded (text, Excel). • Selecting FILTERS or ATTRIBUTES will allow you to go back and make changes. • Limited to species represented in Ensembl

  18. 2. Online analysis tools Database for Annotation, Visualization and Integrated Discovery (DAVID) http://david.abcc.ncifcrf.gov/conversion.jsp This tool works for a wide range of species.

  19. Paste in your accession list (You can also upload a file of accessions.)

  20. Select accession type. NOTE: If you choose “Note Sure” the tool will try to decide what type of accession you have.

  21. Select gene list. Submit list.

  22. Select the type of accession you want to convert TO.

  23. Any ambiguous IDs are listed for you to decide.

  24. 3. UniProt ID Mapping

  25. Paste accession list (>1000 may cause errors). COMMENT: Note the difference between UniProt Accessions and UniProt IDs. UniProt accessions are a short string a letters and numerals 6-8 characters long. UniProt IDs have a suffix related to the species name. Eg: Cassava Hydroxynitrilase Accession: P52705 ID: HNL_MANES

  26. Select the accession type you have: and the accession type you want to convert to: Click on MAP

  27. The mapping link will display a tab separated file that can be displayed in Excel:

  28. 4. AgBase: ArrayIDer Maps ESTs to gene/protein accessions. Contact AgBase to request additional species.

  29. Upload a list of dbEST accessions or EST names.

  30. An email will be sent with a link to the results. Results are formatted as an Excel file.

  31. For additional help with database accessions please contact AgBase.

  32. Working on your own data: NOTE: • Always keep note of what tool you used to do the accession ID mapping/conversion and its version/update/date. • Keep a copy of your original IDs and what they mapped to so that you can refer back to this during your modeling.

  33. Tutorial 1: ID conversion The AgriGO GO enrichment analysis tool accepts the following inputs for rice: • GenBank ID: AAP50233.1 • DDBJ ID: BAB11514.1 • EMBL ID: CAA18188.1 • UniProt ID: Q9LYA9 • RefSeq Peptide ID: NP_564434 We will convert a list of Rice Affy IDs to these IDs for use in the AgriGO tool.

  34. Arrays: ID Mapping • “annotation” file that shows which database accessions the probes were based on • array annotation files may include multiple database IDs • Commercial arrays – may be updated regularly • Custom/Research arrays – not updated as often • Always check when the last ID mapping was updated, as this data changes continually

  35. Array annotation available: FHCRC chicken 13K GPL2863 Agilent-015068 Chicken Gene Expression Microarray 4x44k GPL8764 Avian Innate Immunity Microarray (AIIM) GPL1461 Affymetrix Chicken Genome Array GPL3213* UIUC Bos taurus 13.2K 70-mer oligoarray GPL2853 Affymetrix Bovine Genome Array GPL2112 Agilent-015354 Bovine Oligo Microarray (4x44K) Equine Whole Genome Oligonucleotide (EWGO) array Array annotation in progress: ARK-Genomics G. gallus 20K v1.0 GPL5480 FHCRC Chicken 13K v2.0 GPL1836 Chicken cDNA DDMET 1700 array version 1.0 GPL3265

  36. Tutorial 1: ID conversion Work through tutorial 1 on the workshop website. Alternatively – work on your own data set during this time, using the tutorial as a guide.

More Related