Enhancing Proneural Differentiation Research: Insights from the GMOD Project
This project highlights the development of a model organism database (MOD) focused on proneural differentiation genes in Drosophila. Led by neurobiologist Michael Caudy, the initiative leverages large-scale data from transcription factor binding sites and their interactions with key pathways, specifically the Notch signaling pathway. With no prior computer science training, Caudy took a bioinformatics course to address the 'simple' problem of managing extensive datasets. The resulting database will support gene exploration, literature citation, and connections to external resources like FlyBase and GenBank.
Enhancing Proneural Differentiation Research: Insights from the GMOD Project
E N D
Presentation Transcript
The GMOD Project Lincoln Stein Cold Spring Harbor Laboratory
Test Subject: Michael Caudy • Drosophila neurobiologist • Proneural differentiation • notch pathway • HLH transcriptional activators/repressors • achaete/scute complex • No computer science training • Took my “bioinformatics for biologists” course
“Simple” Problem • Discover the transcriptional factor binding site code controlling proneural differentiation.
Regular Expression Search • Using achaete promoter as exemplar, search for combinations of known binding sites in particular architectures
Mike’s Got Lots of Data • 90-11,000 TF binding site clusters • 100s-1000s of genes • millions of interactions • Which genes are involved in neural differentiation? • Which have interactions with the pathway? • Which have suggestive mutant phenotypes?
Mike Needs a Database • Database management system for proneural differentiation genes. • Visualization/exploration tools for relationship of genes to putative TF clusters. • Literature citations • Link out to FlyBase, Genbank & other DBs. • Add notes and other annotations.
Try to do it with Filemaker • “Cluster-centric” vs “gene-centric”? • Data import from FlyBase? • Storing images? • Maintaining relationships between genes & clusters? • Updates?
Mike Needs a MOD • Model Organism Database • Repository for reagents • Stocks, vectors, clones • Genetic & physical maps • Large-scale data sets • Genome • EST sets, microarray results, 2-cell hybrid interactions • Literature • Ontologies & Nomenclature • Meetings, announcements
How WormBase Works Web server Images, Movies Perl scripts You Database access library Genomic Data ACeDB MySQL
Sorry Mike • WormBase website difficult to install • Data model nematode-centric • Data entry tools very process-specific • Customization difficult • Software documentation uneven • Standard operating procedure documentation uneven
MOD Redux • SGD, MGD, FlyBase, TAIR, RGD… • The same basic idea as WormBase • Implementation entirely different • Wheel reinvented many times • Little software sharing • This madness must stop!
The GMOD Project • Portable, open source software to support model organism databases • Multiple MODs involved • Worm, fly, yeast, mouse, arabidopsis, rat, monocot, [fugu], [E. coli] • Funded by NIH as of June 2002 • Programmers, coordinator, quarterly meetings http://www.gmod.org
Modular Applications The GMOD Pyramid Modular Schema Open Source DBMS & Middleware
genetic maps liter- ature genome A MOD Construction Set map browser map editor Appplication Layer annotation pipeline genome browser genome editor citation browser citation editor Bioperl BioJava BioPython Middleware Layer genomes maps citations Database Layer
Chado – Modular Schema • Common schema for use by FlyBase and WormBase • Ontology Driven • Small number of generic tables e.g. “feature” • Controlled vocabulary names object types and relationships among them: • “achaeteproteinis aHLH activator” • “m8 proteininhibitsachaetetranscription” • Evidence-Savvy
GMOD Applications • Apollo genome annotation editor • Gbrowse generic genome browser • PubSearch literature curation editor • CMAP comparative map browser • IMD insertional mutagenesis database management system
Apollo Data adapters • Parser -> data models -> display • Existing data adapters • GAME XML • GFF • Ensembl CGI server • DAS • Write your own data adapter! • Extend AbstractDataAdapter class • Display options defined in config file
Who is Using Apollo? • BDGP • Reannotated Drosophila genome • Bristol-Myers Squibb • Launching Apollo from web browser via mime types • GNF • JDBC adapter layer over BioSQL • Biogen • View human genome alignment between public and Biogen internal database • Connected BLAT pipeline to Apollo • HGMP-RC Fugu Genomics group • Displaying annotations on fugu scaffolds
Extensively Customizable • End-user • Turn tracks on and off, change order, change packing & labeling attributes (stored in cookie) • Data provider • Change fonts, colors, text. • Change overview – genetic map, contigs, coverage, karyotype. • Define new tracks using simple config file. • Tinker with track appearance to hearts content.
Adding a New Track (a) Create a GFF file named “deletions.gff” Chr1 targeted deletion 1293224 1294901 . . . Deletion d101k2 Chr1 targeted deletion 8239811 8241116 . . . Deletion d680k2 Chr2 targeted deletion 5866382 5866500 . . . Deletion d007k2 (b) Run the load_gff.pl script > load_gff.pl –d example_database deletions.gff Loading features… Done. 3 features loaded. (c) Add a new track “stanza” to the gbrowse configuration file [Knockout] feature= deletion glyph= span fgcolor= red key = Knockouts link = http://example.org/cgi-bin/knockout_details?$name citation= These are deletion knockouts produced by the example knockout consortium (http://example.org/knockouts.html)
Extensively Extensible Plugins gbrowse CGI script Apache Web Server Glyphs Bio::Graphics library Oracle adaptor BioPerl library Flat File adaptor Bio::DB::GFF adaptor Chado adaptor Oracle MySQL/Postgres Flat Files
GenBank Proxy Adaptor Bio::DB::GFF adaptor GenBank MySQL GBrowse on GenBank? GBrowse on GenBank! Plugins gbrowse CGI script Apache Web Server Glyphs Bio::Graphics library BioPerl library
Who is Using GBrowse? • GMOD Members • WormBase, FlyBase, RatDB • HGMP-RC Fugu genomics group • KEGG (multiple microorganisms) • Ingenium AG (mouse) • Bristoll-Myers Squibb (drosophila) • Texas A&M University (salmonella) • McGill University (human chr7) • Institute of Systems Biology (human)