1 / 11

Dataflow

Dataflow. Karyn Mégy. VectorBase http://www.vectorbase.org. Who?. EBI Dan Lawson Karyn Megy Notre Dame Dave Campbell Harvard Dave Emmert Pinglei Zhou Lynn Crosby Susan Russo UNM Phil Baker. Aim. Improve automatic annotation

delling-ull
Télécharger la présentation

Dataflow

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dataflow Karyn Mégy VectorBase http://www.vectorbase.org

  2. Who? • EBI • Dan Lawson • Karyn Megy • Notre Dame • Dave Campbell • Harvard • Dave Emmert • Pinglei Zhou • Lynn Crosby • Susan Russo • UNM • Phil Baker

  3. Aim • Improve automatic annotation • Resolve conflictive community annotations (same locus)

  4. Manual/Community Annotation DataflowImproving the gene set ChadoXML Harvard UNM ChadoXML ChadoXML(gff) ChadoXML Apollo EBI ChadoXML Ensembl DB NotreDame (CAP) gff DAS ChadoXML CAP Community

  5. Potential issues • New gene build Generating vs. annotating 2. Patch build issues = Conflict in manual annotation (1 locus, > 1 gene submitted) 3. Modifying submitted models

  6. 1a. Generate a new genebuild E.g. Anopheles AgamP3.6 -> AgamP3.7 ChadoXML Harvard UNM 2 AgamP3.6 ChadoXML 1 ChadoXML(gff) ChadoXML Apollo EBI DATA FREEZE 3 5 ChadoXML Ensembl DB AgamP3.7 NotreDame (CAP) 4 gff ChadoXML => Data freeze

  7. 1b. Annotate a new genebuild E.g. Anopheles AgamP3.6 -> AgamP3.7 ChadoXML Harvard UNM 2 ChadoXML 1 ChadoXML(gff) ChadoXML Apollo EBI 3 0 ChadoXML Ensembl DB AgamP3.7 NotreDame (CAP) 4 gff ChadoXML => New dump

  8. Patch build issues

  9. 2. Patch build issues E.g. Anopheles AgamP3.6 -> AgamP3.7 ChadoXML Harvard UNM 2 ChadoXML 1 ChadoXML(gff) ChadoXML Apollo EBI 3 DAS/gff ChadoXML Ensembl DB AgamP3.x NotreDame (CAP) 4 gff ChadoXML

  10. 3. Modifying submitted gene models Harvard UNM ChadoXML Apollo If pass this point: - Harvard trick EBI • If pass this point: • Submit as new model • Integration of new model • (EBI trick) Ensembl DB NotreDame (CAP) If pass this point: - No integration new model - Submit as new model ChadoXML

  11. Considerations • Timeline • Bulk data from UNM every x months? = New gene set every x months • Manual model revision • More and newer data for automatic predictions • Should we revise manual annotations? • When/should manual annotations loose their golden label?

More Related