The WordSeeker Functional Genomics ToolkitLonnie Welch, Stuckey ProfessorBioinformatics LaboratoryElectrical Engineering and Computer Science Biomedical Engineering ProgramMolecular and Cellular Biology Program Ohio Universitywelch@ohio.edu
genes junk The genome Genes: 3% Junk: 97% "So much junk DNA in our genome." (S. Ohno, 1972) "DNA differs from written language in that islands of sense are separated by a sea of nonsense, never transcribed." (Richard Dawkins, 2004)
“The aim of the ENCODE (encyclopaedia of DNA elements) project is • to identify every sequence with functional properties in the human • genome. • Some highlights of the pilot phase of this project: • involved an analysis of 1% (30 megabases) of the human genome • remarkably: • much functional information is not “conserved” across organisms • up to 93% of bases in the ENCODE regions are transcribed • not good news for genes, which will no longer be able to hog the limelight • the genome is much more than a mere vehicle for genes  John M. Greally, Genomics: Encyclopaedia of humble DNA, Nature 447, 782-783 (14 June 2007).
“Perhaps it is time tobid farewell to the term ‘junk’ DNA – we knew not your true nature.” • (Regulatory RNAs and the demise of ‘junk’ DNA. Genome Biology 2006, 7:328) The genome genes Functional elements? Functional Elements: 90%?? Junk: 10%?? "...a certain amount of hubris was required for anyone to call any part of the genome 'junk, ' given our level of ignorance."(Francis Collins, 2006)
WordSeeker Users • OU • Sarah Wyatt (NSF, NASA) – plant gravitropism; regulatory genomics • Allan Showalter (NSF, USDA) – cell wall genes; functional genomics • Susan Evans (NIH) – regulatory aspects of cancer • OARDC • Eric Stockinger (NSF) – cold tolerance in crops • OSU • Erich Grotewold (NSF, USDA, DOE) – genome-wide regulatory genomics • Rebecca Lamb (NSF) – cell development • BGSU • Paul Morris (NSF, DOE) – homology in Oomycete promoters • National Human Genome Research institute (NIH) • Laura Elnitski – regulatory aspects of cancer • Centers for Disease Control • Henry Wan – avian flu
Genome Database • organized in six major organism groups: Archaea, Bacteria, Eukaryotae, Viruses, Viroids, and Plasmids • provides views for a variety of genomes, complete chromosomes, sequence maps with contigs, and integrated genetic and physical maps source: National Center for Biotechnology Information, April 2008.
Additional Suggestions(Prof. Frank Drews, OU EECS) Desirable hardware features: • Memory intense applications utilizing many cores will saturate the front-side busses • Large number of cores high front-side bus bandwidth • Large last-level caches • Equip cluster nodes with the new Graphics Processing Units (GPU’s) (such as NVIDIA's GeForce 8800 series GPU's) for memory intense algorithms • Can off-load some processing to these GPU’s • A number of recent bioinformatics algorithms run on these GPU’s and show impressive speed-ups • E.g., M. Schatz, C. Trapnell, A. Delcher, A. Varshney, High-throughput sequence alignment using Graphics Processing Units, BMC Bioinformatics, Vol. 8, No. 1. (2007)
Ohio Bioinformatics Consortium • Statewide Bioinformatics Curriculum • Comprehensive curriculum • Shared courses • Managed by Ralph Regula School • Bioinformatics Research Infrastructure • State-of-the-art • Biological researchers define requirements • Bioinformatics researchers design algorithms to meet requirements • Ohio Supercomputer Center integrates, hosts and supports bioinformatics software $9M will be invested over the next 5 years via Choose Ohio First.