210 likes | 344 Vues
This document outlines the requirements, structure, and proposed architecture of the Tcl-DB, a prototype taxonomy database designed to enhance biological data management. It details the hierarchy, alternative search terms, and classifications for entries. The emphasis is on the database's extensibility, UID tracking, and utility in computing statistics related to taxonomy overlap and uniqueness among datasets such as ITIS and NCBI. Suggestions for future work include interface building, data updates, and considerations for phylogenetic coding.
E N D
Outline • Requirements • Hierarchy • Alternative Search Terms: Synonyms and Vernaculars • Alternative Spellings • Alternative Classifications • Tcl-DB Prototype System • Tcl-DB Structure • 2NF • Extensibile: Adding a new data source e.g. NCBI • Tcl-DB: UID Tracking • Tcl-DB: Stats • Utility and Further Work
3. Alternative Spellings: Caenorabditis elegans, C elegans and Caenorhabditis elegans
Assertion: Resolving the M:M with an association entity
Node: Hierarchical Queries Nested Set, Path and Connect by >select count(name_id) from node start with name_id = ‘100891' connect by prior name_id = parent_name_id; >select count(name_id) from node where path like '/%'; >select count(name_id) from node where left_id between 1 and 9290;
synonym_name and vernacular: subtypes,multi-valued attributes or weak entities
Tcl-DB: Procedures, Packages and Functions: Adding a new data source e.g. NCBI
Step 4: fill synonym_name table in tcl schema Step 5: fill vernacular table in tcl schema
Tcl-DB: UID Tracking • after name data load: • Run two joins on name and nids_mv • Nids – name_id when the name_text exist • Null – name_id when the name_text not exist • Update name and give all new names a NID • Update name give all names their original NID • Refresh the NID_view
Tcl-DB: Utility and Further Work • Computing Interesting Stats: • How much overlap between ITIS and NCBI? • How many names unique to NCBI? • How many of these are binomials Vs ‘environmental sample 256’ • How many of these names can be matched allowing for 1 – 3 letter mismatches. • NCBI taxonomy – data quality, Integrity and Usability? • Transitively closing the Synonyms Table and Vernacular Table • Building an interface. • Spell checkers
Lots of Questions?How do we use this to build taxonomically aware databases?How about updates to the data?Database links , Web services, Simple DB Cross References?Use Genbank Model?Open to Suggestions/Ideas!Do we need to think about:PhyloCode?Type Specimens?