1 / 16

The pPOD Core Data Model

The pPOD Core Data Model. The pPOD CDM team: Bill Piel, Shirley Cohen, Tim McPhillips, Shawn Bowers, Sarah Cohen-Boulakia, Val Tannen

louise
Télécharger la présentation

The pPOD Core Data Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The pPOD Core Data Model The pPOD CDM team: Bill Piel, Shirley Cohen, Tim McPhillips, Shawn Bowers, Sarah Cohen-Boulakia, Val Tannen Special thanks to Brent Mishler, David Maddison, Jeff Oliver, Rutger Vos, Francois Lutzoni, Martin Ramirez, Jonathan Coddington, Wayne Maddison, Fan Ge, Ashley Green,Jin Ruan, Martin Wu, John Lundberg, John Sullivan

  2. Goals • The Core Data Model (CDM) under development in the pPOD project will serve the following purposes: • It will allow experimentation with the modeling of provenance in phylogenetic pipelines. • It will serve as a schema for a persistence tool, to work (1) in standalone mode, (2) with our lab notebook suite and (3) integrated with Mesquite as a module. 3. It will serve as a target for schema mappings used to connect other AToL databases, resources like TreeBASE, etc., using the Orchestra integration engine.

  3. The Role of Provenance Backwards provenance “query” Starting from a research “product”, eg. a tree, a supertree, a matrix, track backwards through stored objects to all the raw input information that led to this product. Forwards provenance “query” Starting from a raw input, eg., a specimen, an image, a sequence, track forwards through stored objects to all research products that this input contributed to. In both cases, navigate biological assumptions in both directions, eg., homology assumptions.

  4. store commands provenance query query (phylogenetic query language) AToL AAA schema mappings TreeBASE persistence manager RDBMS Persistence Tool CDM (an OO schema) Kepler-based workflow tool Mesquite module

  5. AToL Data that needs to be modeled in CDM(not an exhaustive list) Analyzed data: trees, matrices,cells,(row) segments, operational taxomic units (OTUs),taxa, standard characters and their states, genes,gene fragments Raw data: standard views,images, sequences,chromatograms,primers, specimens,samples, collections

  6. CDM: Phylogeny Inference Data Analyzed data: trees, matrices, operational taxomic units (OTUs), standardtaxa Tree provenance authority StdTaxon Matrix isA Set taxon OTU List StdMatrix SeqMatrix

  7. Modeling Provenance (1) provenance Tree Matrix …but also… Software(Parameters) Author Date Must be modeled and stored explicitly! But it can be provided by automatic workflow tools

  8. “Kinds” of Provenance In our CDM tools • Relationship between stored objects • Eg., tree T123 was obtained from matrix M456 by Joe Bio on 01/31/2001 using PAUP with parameters… (SEE PREVIOUS SLIDE) • Tracking through copy or cut/paste operations, possibly across repositories • Trace of data moving through a workflow • Sequence of timestamps, tool invocations (parameters), authors • Trace of data through a logically expressed view/query • Can be computed automatically as the view/query output is computed In our workflow tool

  9. CDM: Morphological Data Analyzed data: standard matrices,cells, standard characters and their states, Raw data: standard views,images, specimens,collections

  10. prov OTU Specimen Collection List Matrix StdMatrix Cell prov prov code(states) Set Image List prov StdChar StdView states : List <string> Set

  11. Modeling Provenance (2) … img 194 … cell(0,0) tree T123 spec 19 … img 193 … … matrix M456 cell(28,23) img 206 spec 20 … … … img 204 … cell(28,45) spec 21 img 211 … … …

  12. Example of Phylogenetic Query Find all standard matrices with some character C whose label contains the substring "elytra" and some OTU whose state for character C contains the substring "transverse"; return all such matrices, together with their characters, OTUs and states satisfying the conditions.

  13. Semi-formalized (OQL) query example SELECT M, label of C, label of X, label of state encoded in cell E FROM M over all standard matrices, C over all characters of M, X over all OTUs of M, E is the cell corresponding to C and X in M WHERE the label of C is like "*elytra*" AND the label of the state encoded in cell E is like "*transverse*"

  14. Molecular Data Analyzed data: sequence matrices,(row) segments, genes,gene fragments Raw data: sequences,chromatograms,primers, specimens,samples, collections

  15. molecular matrix gene frag 1 gene frag 2 OTU1 OTU2 from some contig a row segment (from some sequence) from different specimens

  16. prov??? List List Row Segment SeqMatrix endPos : int prov List List ColumnSeg OTU Contig endPos : int prov Set isA Raw Sequence Protein GeneFragment prov prov prov prov Set Set Primer Chromatogram prov prov Collection Specimen Sample

More Related