1 / 35

Managing and Exploiting “Post-Genomics Era” Data

Managing and Exploiting “Post-Genomics Era” Data. David O. Nelson Matt Coleman Lawrence Livermore National Laboratory. The “New” Biology: X -omics. Traditional reductionistic approach: One gene/protein/reaction at a time. Test/validate isolated models at bench. New “systems” approach:

miya
Télécharger la présentation

Managing and Exploiting “Post-Genomics Era” Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Managing and Exploiting “Post-Genomics Era” Data David O. Nelson Matt Coleman Lawrence Livermore National Laboratory

  2. The “New” Biology: X-omics • Traditional reductionistic approach: • One gene/protein/reaction at a time. • Test/validate isolated models at bench. • New “systems” approach: • All DNA/RNA/proteins surveyed at once. • Need to • Manage data globally (across labs, sites, …) • Analyze large batches of intermediate results. • Provide links to minute details when required.

  3. Outline • Introduction to DOE Low-Dose Program • Microarrays • Overview and analyses • Microarrays at LLNL • An Example Project • Gene regulation after exposure to Ionizing Radiation (IR)

  4. The DOE Low-Dose Radiation Research Program • Goal: develop radiation standards based on risk. • Focus: biological mechanisms of radiation response • low-dose (< 0.1 Gy) • low dose-rate (< 0.1 Gy / Yr) • Scope: • ~54 projects • See http://lowdose.org

  5. Microarrays in the Low-Dose Program Use microarrays to • Identify candidate low-dose biomarkers of radiation • exposure, • early cellular response, • downstream effects, and • susceptibility. • Assess risk through mechanism-based understanding of cell and tissue response. • Genomic regulation of low dose response. • Identifying affected biological pathways and functions. • Predicting novel biological pathways and functions.

  6. What Is A Microarray? • Microarray—a 2d array of spots on a glass slide. • Each spot contains DNA (or RNA). • Usually different DNA on each spot. • Some within-slide reps for QC. • Make bunches at a time and hybridize with tissue extracts.

  7. Hybridize Arrays With Tissue Extracts • An array is simultaneously exposed to one or more tissue extracts (different treatments). • DNA in each extract labeled with fluorescent tag. • Different tag for each tissue extract. • DNA in tissue sticks (hybridizes) with its mate on slide. • Competitive hybridization when >1 tissue. • Quantity hybridized ~ concentration.

  8. Scan Arrays to Acquire Concentration Data • Fluorescent tags are excited by laser scanner. • Intensity is read off by PMT’s. • One band-pass filter and PMT per fluorophore. • Result is an intensity image in 1+ spectral bands.

  9. The “Raw” Data Is Really Highly Cooked • Find ROI for each spot. • Estimate background outside of spots. • Eliminate background. • Combine intensities from each pixel to estimate signal. • Assess quality of estimate. • Result: an n x k array of intensities for hybridization. • n spots, k colors.

  10. Two Types of Microarray Systems

  11. Two probes per experiment Probes are cDNA from tissues Labeled with different dyes Data are intensities from two band-pass filters Usually red and green Experiments with cDNA Microarrays

  12. ~1200 interesting genes Genes associated with radiation effect IR modulated associated with differential IR response Genes for DNA repair and stress response DNA repair cell cycle control Apoptosis stress response meiosis genes Developmental and spermatogenesis genes Mouse 50 Gene Array Includes ~15 radiation response genes 403 Gene Array Includes ~100 radiation response genes 833 Gene Array Includes ~200 radiation response genes Human 99 Gene Array Includes ~ 40 radiation response genes ~500 Gene Array Includes ~ 200 radiation response genes ~800 Gene Array Includes ~ 200 radiation response genes Custom Arrays at LLNL

  13. One probe per experiment Probes are labeled RNA Data is intensity from one band-pass filter Treatment and blocking completely confounded Oligonucleotide Microarrays from Affymetrix (Affy Arrays)

  14. The Human Genome U95 and U133 Set - (6 chips) comprehensive transcript for human genome. Study the expression level of >60,000 human genes. Mouse Genome U74 Set - (3 chips) Biggest mouse genome gene set currently available. 36,000 mouse genes and EST’s Others available SNP Analysis Cancer Set Yeast, Fly, E. coli, Rat Available Affy Arrays

  15. Problems in Analyzing Microarray Data • Experimental design creates problems. • Chip-to-chip variation confounded with treatment differences. • Normalization is used to try to adjust. • Only one or two treatments per chip. • Treatment comparisons more complex. • Modern designs can help. • “Best” way to arrange and process data still in flux. • How to pick the winners with 104 tests? • Multiple testing vs. exploratory analyses.

  16. Mouse IR Gene Expression Tissue: Brain, Testis Dose: 0, 0.1, 2.0 Gy Time: 0.5, 4 hours Human IR Gene Expression Tissue: 3 cell lines Dose: 0, 2.0, 0.05+2.0, 0.1 Gy Time: 4 hours Y. Pestis 3 strains, 2 host cell lines Temp: 20° and 37° C Time: 1.0 and 10 hours Mouse Development 5 early embryo stages 5 spermatogenesis stages 5+ mutagens Other IR-related labs A. Fornace, NCI G. Chu, Stanford B. Lehnert, LANL D. Chen, LBNL Experiments at LLNL and Elsewhere

  17. Infrastructure Development Underway at LLNL • Web-based data acquisition and storage. • Open-source tools in R for describing designs and analyses. • HTML and XML tools for results presentation. • Integrate with SDM for exploitation of downstream tools.

  18. LLNL’s Current Local Data Storage System Doesn’t Scale

  19. What Information Must Be Stored—MGED Effort • Minimum Information About a Microarray Expmt (MIAMI) • Experimental Design • Array Design • Samples • Hybridizations • Measurements • Controls • Interchange Formats, Ontologies, etc. • http://www.mged.org

  20. Analytical Tools Are Developing Rapidly • Bioconductor project. • Open-source tools for microarray analysis. • http://www.bioconductor.org • Statisticians at HSPH, Lucent, UC Berkeley, Stanford, Johns Hopkins, LLNL, etc. are involved. • Mike Eisner at LBNL one of biology pioneers.

  21. Summary: Microarray Data Integration Needs • Current experiments will produce 1010 – 1012 bytes of data. • Planned experiments much more. • Must integrate with data from • Intermediate results/analyses. • Local data repositories. • External sequence/protein data from NCBI and elsewhere. • External analysis tools. • Tool set undergoing rapid development.

  22. Example Project: Genome-scale Modeling of IR Gene Networks • Hypothesis: Similar expression patterns in response to low-dose IR => genes in coordinated expression groups. • Significance: Understanding regulation of expression groups will help • Understand biological processes. • Identify determinants of IR susceptibility.

  23. Find interesting genes using microarrays. Obtain cDNA sequences for genes of interest. BLAST cDNA sequences against Unfinished High-Throughput Genomic Sequences or “Nonredundant Databases”. Identify start of transcription based on cDNA-genomic sequence alignment. Select 1000 bases in front of transcription site. Analyze sequences for TFBS’s using ModelInspector. Build a consensus model using location and consensus of TFBS’s. Search for other genes with same promoter model. Compare new genes with genes already in group. Building and Extending a Promoter Model

  24. 3. Search DBMS’s for Sequences 1. Do Microarray Experiment 2. Extract cDNA Cluster(s) A B C 6. Hypothesize Promoter Model 4. Extract Upstream Sequences 5. Identify Potential Promoters 7. Search DBMS’s for Other Genes 8. Potential Genes Promoter Model Discovery Workflow Adapted from Thomas Werner Biomolecular Engineering, 17: 87-94 (2001)

  25. Step 1: Get Genes from , e.g., Cluster Analysis

  26. Step 2&3: Blast cDNA vs. GenBank…four big hits

  27. Assume 1st Position of cDNA is Start of Transcription • Need 1000 bases upstream. cDNA Genomic

  28. Step 3: Get and Annotate 1kb Upstream • User-added annotation for use in later analysis (loc, clone & cDNA accession number, direction)

  29. Step 4: Use External Tool Made for Promoter Analysis • http://genomatix.gsf.de

  30. Tool Knows About Promoters and Promoter Context

  31. Resulting Aligned Promoter Region with Predictions

  32. Transcription factor binding sites DNA Start of transcription Putative model for down regulation A Putative Model for the HK2 Cluster…

  33. Subsequent Steps… • Go back into GenBank and find other sequences with same promoter patterns. • May reflect genes that are co-regulated. • Figure out how new genes fit into picture.

  34. What Do We Know Now? • Data management problems will be severe, by traditional biological standards. • Exploiting this data will require better tools for integrating disparate data, data bases, and analytics. • Must adapt rapidly to changing technology/scientific directions.

  35. Key Low-Dose Personnel and Collaborators Expression array technology Custom and Commercial Arrays ImageCapture Paul VanHummelen &Processing Rajiv Raja Matt Coleman Laura Kegelmeyer Brenda Marsh Don Peters Shalini Mabery Array InformaticsImage Clones, selection David Nelson Christa Prange Tom Slezak, etc. Dave Wilson Leif Peterson, Baylor U. Jeff Gregg, UC Davis Mouse in vivo Model BaselineIR response Lisa Cheeseman Eric Yin Human Lymphoblastoid Model Effects of IRAdaptive Response Matt Coleman Jim Tucker Brenda Marsh Karen Sorensen Matt Coleman Other Key Collaborators J. Gregg, UC Davis, bioinformatics and hybridization technology S. McCutchin-Maloney, LLNL, protein analyses D. Wilson, LLNL, DNA repair F. Marchetti, LLNL, cytogenetics

More Related