180 likes | 337 Vues
GUS Overview. June 18, 2002. GUS-3.0. Genomics Unified Schema. Supports application and data integration Uses an extensible architecture. Is object-oriented even though it uses an underlying relational database management system (Oracle).
E N D
GUS Overview June 18, 2002
GUS-3.0 Genomics Unified Schema • Supports application and data integration • Uses an extensible architecture. • Is object-oriented even though it uses an underlying relational database management system (Oracle). • Warehouse instead of federation for local stable copy • Uses standards for bulk data exchange (e.g., MAGE)
GUS Usage • Annotation • of genomes - gene models, sequence features • of genes - gene function, gene expression, gene regulation • Data mining • Develop algorithms and queryable resource • Publish • Map identifiers with other resources/ databases • URL for entry retrieval/ ad hoc queries in web interface
GUS-3.0 Name Spaces GUS has 5 name spaces compartmentalizing different types of information.
Application Integration: PlasmoDB PublicDatabases TIGRSangerStanford PlasmodiumInvestigators Existing implementation Future implementation QTL,POP, SNP, Clinical GenBank, InterPro, GO, etc GenomicSequence microArray& SAGEExperiments GSSs &ESTs MappingData Annotation Object Layer Oracle/SQL DoTS TESS RAD Core SRes AutomatedAnalysis &Integration Annotator’s Interface Java Servlets &Perl CGI GenePlotCD WWW queries,browsing, & download GenePlotSoftware
DoTS RAD TESS SRES Core GUS Supports Multiple Projects AllGenes PlasmoDB EPConDB Java Servlets Oracle RDBMS Other sites, Other projects Object Layer for Data Loading
Main Aspects of GUS Development • Choice of development tools • Schema: • CREATE TABLE statements • Documentation plug-in: input is tab- delimited text • UML - Rational Rose, PowerDesigner • Code: CVS • Areas to emphasize • Plug-ins • Work flow • TESS • Proteomics • Images • Preferred type of user interface • JSP • PHP
Data Integration DoTS • GO • Species • Tissue • Dev. Stage • Genes, gene models • STSs, repeats, etc • Cross-species analysis Genomic Sequence Ontologies • Characterize transcripts • RH mapping • Library analysis • Cross-species analysis • DOTS Transcribed Sequence SRes RAD TESS • Arrays • SAGE • Conditions • Binding Sites • Patterns • Grammars • Domains • Function • Structure • Cross-species analysis Gene Regulation Transcript Expression Protein Sequence Core • Ownership • Protection • Algorithms • Similarity • Versioning • Workflow Data Provenance Transcription factors up-regulated in acute myeloid leukemia with sequence similarity to c-fos and common promoter motifs
Identify shared TF binding sites Genomic alignment and comparative Sequence analysis TESS RAD GUS EST clustering and assembly
GUS Approach to Schema • Think objects • Parents and children • Subclassing with views • Views • Start with generic Imp table (e.g., NAFeatureImp) that contains base attributes plus generic attributes of various datatypes • Superclass view (e.g., NAFeature) just has base attributes • Subclass views (e.g., RNAFeature) have additional attributes using generic attributes • Strongly-typed • Tend to avoid “name-value” pairs
DoTS Central Dogma Gene Genomic Sequence Gene Instance Gene Feature NA Feature NA Sequence RNA RNA Sequence RNA Instance RNA Feature Protein Protein Sequence Protein Instance Protein Feature AA Sequence AA Feature
DoTS Schema Has Been Driven By Building Gene Indices Genomic Sequence mRNA/EST Sequence Clustering and Assembly Gene predictions GenScan/ HMMer, PHAT SIM4 or BLAT Predicted Genes DoTS consensus Sequences Merge Genes Gene/RNA cluster assignment Annotate DoTS Manual Annotation Tasks Gene Index framefinder RNAs Proteins translation BLASTX PFAM, Smart, ProDom BLASTP Other computed annotation (EPCR, AssemblyAnatomyPercent, Index Key Words, SNP analysis) BLAST Similarities Functional predictions Protein Motifs GO Functions
DoTS Gene Indices Are Based on Clustering and Assembling ESTs
RAD 3.0 Schema Incorporates MAGE and Experience With Microarrays LIMS for Data Analysis. Also holds SAGE.
Status of GUS Namespaces • Core • Tables exist, Workflow documented • Sres • Tables exist • DoTS • Tables exist, some documentation • RAD • Version 3.0 to include MAGE, experience • Pretty much complete • Tables exist, mostly documented • TESS • Tables ready but not created
Schema Development • Releases on Sourceforge: • CREATE TABLE statements • Table dumps from Core::TableInfo, Core::DatabaseDocumentation • Gifs of ER diagrams • Adding tables between releases • In CVS tree? • Use message forum for discussion
Documentation • Schema Browser looks at TableInfo • Plug-in • Populates DatabaseDocumentation • Input: Table\t\tDescription of table Table\tAttribute\tDescription of attribute
GUS Schema Browser • http://www.cbil.upenn.edu/cgi-bin/GUS30/schemaBrowser.pl?db=GUS30 • Points at GUS30 on CBIL development database server (erebus). • Need to move? Maintain release view? • DoTS Tables: • Central dogma • Evidence/ Similarity • ProjectLink • SequenceGroupImp/ SequenceGroupExperimentImp • Plasmomap? • Other tables of interest?