Meeting the Bioinformatics Challenges of Functional Genomics

Meeting the Bioinformatics Challenges of Functional Genomics VanBUG 11 September 2003

Acknowledgments <johnq@tigr.org> TIGR Human/Mouse/Arabidopsis Expression Team Emily Chen Bryan Frank Renee Gaspard Jeremy Hasseman Lara Linford Fenglong Liu Simon Kwong John Quackenbush Shuibang Wang Yonghong Wang Ivana Yang Yan Yu Array Software Hit Team Nirmal Bhagabati John Braisted Tracey Currier Jerry Li Wei Liang John Quackenbush Alexander I. Saeed Vasily Sharov Mathangi Thiagarajan Joseph White Assistant Sue Mineo The TIGR Gene Index Team Foo Cheung Svetlana Karamycheva Yudan Lee Babak Parvizi Geo Pertea Razvan Sultana Jennifer Tsai John Quackenbush Joseph White H. Lee Moffitt Center/USF Timothy J. Yeatman Greg Bloom PGA Collaborators Gary Churchill (TJL) Greg Evans (NHLBI) Harry Gavras (BU) Howard Jacob (MCW) Anne Kwitek (MCW) Allan Pack (Penn) Beverly Paigen (TJL) Luanne Peters (TJL) David Schwartz (Duke) Emeritus Jennifer Cho (TGI) Ingeborg Holt (TGI) Feng Liang (TGI) Kristie Abernathy (mA)Sonia Dharap (mA)Julie Earle-Hughes (mA)Cheryl Gay (mA)Priti Hegde (mA)Rong Qi (mA) Erik Snesrud (mA) Heenam Kim (mA) TIGR PGA Collaborators Norman Lee Renae Malek Hong-Ying Wang Truong Luu Bobby Behbahani Funding provided by the Department of Energyand the National Science Foundation Funding provided by the National Cancer Institute,the National Heart, Lung, Blood Institute,and the National Science Foundation TIGR Faculty, IT Group, and Staff

<johnq@tigr.org> Acknowledgments Thanks to Syntek, Inc. <http://www.syntek.com>for GeneShaving MeV module and assistance with MyMADAM Thanks to DataNaut, Inc. <http://www.datanaut.com>for RelNet and Terrain Map modules and assistance with Client/Server MeV <tm4@tigr.org>

Science is built with facts as a house is with stones – but a collection of facts is no more a science than a heap of stones is a house. – Jules Henri Poincare

There are 1011 stars in the galaxy. That used to be a huge number. But it's only a hundred billion. It's less than the national deficit! We used to call them astronomical numbers. Now we should call them economical numbers. - Richard Feynman, physicist, Nobel laureate (1918-1988)

Microarray Analysis at TIGR

Step 1: Experimental Design

Step 2: Data Collection

Step 3: Data Analysis

Step 4: Consulting with the ArraySW gang in the trailer

Step 5: Sharing data with our collaborators

Steps in the Process Select array elements and annotate them Build a database to manage stuff Print arrays and manage the lab Hybridize and analyze images; manage data Analyze hybridization data and get results

TIGR Gene Indices home page www.tigr.org/tdb/tgi ~60 species >16,000,000 sequences

TGICL Tools are available – with more coming Geo Pertea Razvan Sultana Valentin Antonescu Available with source

Gene Index Assembly process ESTs from GenBank (dbEST) Expressed Transcripts (ET) from GenBank CDS TIGR ESTs remove vector, poly-A, adapter,mitochondrial and ribosomal sequence reduce redundancy High stringency pair-wise comparisons to buildClusters Each cluster is assembled to obtainTentative Consensussequences (TCs) Annotate TCs and release

The Mouse Gene Index<http://www.tigr.org/tdb/mgi>

A TC Example

GO Terms and EC Numbers Babak Parvizi

The TIGR Gene Indices<http://www.tigr.org.tdb/tdb/tgi> Dan Lee, Ingeborg Holt

Building TOGs: Reflexive, Transitive Closure And Paralogues Tentative Orthologues Thanks to Woytek Makałowski and Mark Boguski

TOGA: An Sample Alignment: bithoraxoid-like protein

is easy! Gene Finding in Humans Razvan Sultana

is easy? Gene Finding in Humans Razvan Sultana

is difficult? Gene Finding in Humans Razvan Sultana

is difficult? Gene Finding in Humans A genome and its annotation is only a hypothesis that must be tested. Razvan Sultana

RESOURCERER Jennifer Tsai http://pga.tigr.org/tools.shtml

RESOURCERER: An Example

RESOURCERER: Using Genetic Markers Just added: Integrated QTLs

SOPs are available PCR purification cDNA/template prep RNA labeling Printing Hybridization <http://pga.tigr.org/tools.shtml> Coming: Data QC SOP

What data should we collect? Nature Genetics 29, December 2001 MAGE-ML – XML-based data exchange format <http://www.mged.org> EVERYTHING

MIAME Relational Schema

What’s Wrong with MIAME? • MIAME was designed as a model for capturing information necessary to create public databases. • MIAME-based databases lack LIMS capabilities, which are necessary for large-scale studies. • We do not want to store images in our database for practical reasons – limited space. • We needed to develop a variety of tools adapted to our existing infrastructure and legacy data and databases. • Probes are labeled and applied to the arrays • An “experiment” is a hybridization • A “study” is a collection of hybridization experiments

MAD Microarray Database Schema

Clone Probe Slide_type Protocol Primer_pair Primer New_plate Slide PCR Study Probe_source Experiment Expression Expt_probe Hyb Spot Gene Scan Analysis Normalize Conceptual Schema: MAD

MADAM: Microarray Data Manager  Marie-Michelle Cordonnier-Pratt, UGA converted MySQL to Oracle and made MADAM work! Available with source and MySQL

ExpDesigner

Microarray Slide (with 60,000 or more spotted genes) Microtiter Plate Microbial ORFs + Design PCR Primers PCR Products Eukaryotic Genes Select cDNA clones Many different plates containing different genes For each plate set, many identical replicas PCR Products Microarray Overview I

PCR Scorer Reads/loads primer data file to MAD and allows PCR data entry,and translation of 96  384.(Alex Saeed, developer and maintainerenhancements: Wedge Smith) Selected Genes Primer Design Clone Selection Primer Synthesis PCR Amplification MAD Gel-based Scoring Microarray Overview

The Beast: Microarray Robot from Intelligent Automation <http://www.ias.com>

Additional Software for Arrays: Scheduler Microarray SchedulerAllows scheduling of all instruments Designed and maintained by Jerry Li  Available with source

SliTrack/ControllerTakes Slide Order and Run parameters,generates spot order,IAS control file,launches IAS run software,loads database.(J. Li, developer and maintainer) Amplified/Purified Genes Loaded in Arrayer Run Parameters Set Slides Printed MAD Microarray Overview

Microarray Overview II Measure Fluorescence in 2 channels red/green Control Hybridize, Wash Analyze the data to identifypatterns ofgene expression Test Prepare Fluorescently Labeled Probes

Microarray Overview II Measure Fluorescence in 2 channels red/green Weed Control Hybridize, Wash Analyze the data to identifypatterns ofgene expression Test Bush Prepare Fluorescently Labeled Probes

Microarray Overview II Measure Fluoresence in 2 channels red/green Control Hybridize, Wash Obtain RNA Samples Analyze the data to identify differentially expressed genes Test Prepare Fluorescently Labeled Probes

Microarray Overview MADAMAllows data entry (J. Li & J. White, web prototype;A. Saeed, J. White, J.Li, & V. Sharov, developers) Control Test Hybridize, Wash Prepare Fluorescently Labeled Probes MAD Obtain RNA Samples

Meeting the Bioinformatics Challenges of Functional Genomics