1 / 36

Making sense of large amounts of molecular data

Making sense of large amounts of molecular data. Jason E. McDermott, PhD Research Scientist Computational Biology and Bioinformatics Group Pacific Northwest National Laboratory. How do components of biological systems interact to produce behavior?. Nucleic Acids. Proteins. Macromolecular

bebe
Télécharger la présentation

Making sense of large amounts of molecular data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making sense of large amounts of molecular data Jason E. McDermott, PhD Research Scientist Computational Biology and Bioinformatics Group Pacific Northwest National Laboratory

  2. How do components of biological systems interact to produce behavior? Nucleic Acids Proteins Macromolecular Complex

  3. Molecular pathways mTOR pathway EGFR pathway http://biocarta.com

  4. A Mammoth Problem

  5. 5 Scientific Method Overview Hypothesis Hypothesis Hypothesis Hypothesis Experimental design Interpretation Data generation Predictions Analysis/modeling

  6. Circumstantial Evidence Traditional experimental approach Cigarette butt on street Neighbor was eyewitness to crime Missing jewelry from the house Fingerprints on doorknob High-throughput experimental approach Cigarette sales in city Testimony from everyone on the block All diamonds sold over last year in 10 mile radius Fingerprints on every surface in the house

  7. Problem New methods generating mountains of data Very complex systems Traditional methods fail in some cases Progress will be made through better use of this data Objectives Formulate hypotheses for further investigation Identify gene/protein ‘targets’ Identify pathways that drive disease Develop systems-level biological understanding

  8. What is a ‘target’? ‘Critical nodes’ Regulators of important processes Outcome of modeling (a prediction) that can be used to formulate a hypothesis What are targets used for? Mechanistic understanding of disease processes Potential biomarkers of disease Potential therapeutic treatments: drug development

  9. Examples I’ll be talking about Bacterial virulence (SalmonellaTyphimurium) Viral pathogenesis (avian flu and SARS) Ovarian cancer Approaches I’ll be talking about Machine learning Biological networks Data integration

  10. Salmonella Typhimurium LPS TLR4 MEK ERK Egr-1 Bacterial detection Invasion SPI1+ LPS Effectors SPI2-T3S Effectors Bacterial survival SCV Virulence activation SPI2-T3S Effectors Virulence activation Pathogen directed (e.g. SifA, SlrP, SseJ, SspH2) (e.g. SifA, SlrP, SseJ, SspH2) Environmental Modulation pH Host directed Host defense Environmental response ssrA/B ROS/ RNS Environmental response ssrA/B ompR/ envZ ompR/ envZ phoP/Q Mg2+ iNOS NRAMP phoP/Q ydgT ydgT Fe2+ Pathogen Host

  11. SlrP SspH2 SseI SseJ SifA SifB SpvB SseK-1 SopD-1 InvJ SipC +25 other known effectors +??? other unknown effectors Type-III secretion system secreted effectors Karou Geddes http://en.wikipedia.org/

  12. Overview of the SVM-based Identification and Evaluation of Virulence Effectors (SIEVE) Method

  13. SVM-based Discrimination Positive Negative D2 D1

  14. 14 SIEVE Validation Using CyaA Fusions McDermott, et al. 2011. Infection and Immunity. 79(1):23-32 Niemann, et al. 2011. Infection and Immunity. 79(1): 33-43

  15. 15 • McDermott JE, et al. 2010. Drug Markers, 28(4):253-66. Biological Networks • Types of networks • Regulatory networks • Protein-protein interaction networks • Biochemical reaction networks • Association networks • Network • Node = gene/protein or other component • Edge = inferred relationship between components

  16. Genome SNVs CNVs Comparison methylation Pathway enrichment mRNA LEAP miRNA Network analysis protein phosphorylation metabolome Merging disparate observations of a system to produce a single, more informative view

  17. Network inference method Can we infer a relationship between two genes or proteins based on their expression profiles over a large number of different conditions? conditions A gene B C Faith, J., et al. “Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles.” 2007. PLoS Biology 5:e8

  18. What are networks useful for? • Networks can be used for: • Pretty figures • Hypothesis generation • Functional modules and their organization • Topological identification of target critical nodes • Predicting future states of the network • Networks are NOT useful for: • Final mechanistic insight • Fine distinction of types of interactions between components • Causality

  19. Hubs • High centrality, highly connected • Exert regulatory influences • Vulnerable • Bottlenecks • High betweenness • Regulate information flow within network • Removal could partition network Yu H et al. PLoS Comp Biol 2007, 3(4):e59

  20. 20 Bottlenecks in Salmonella are essential for virulence • McDermott J, et al. 2009. J. Comp. Bio. 16(2):169-180

  21. Discovery of a novel class of effectors by integrating transcriptomic and proteomic networks

  22. Respiratory virus pathogenesis • What are the causes of pathogenesis in respiratory viruses? • Goal: Identify and prioritize potential mediators of pathogenesis that are common and unique to influenza and SARS • Goal: Identify and prioritize potential mediators of high-pathogenecity viral infection • Approach: • Mouse models of infection • Transcriptomics • Network-based approach • Topological network analysis to define targets • Validation studies

  23. SARS-CoV-infected Wildtype Mouse Inferred Network Ido1/Tnfrsf1b Module Kepi Module

  24. Hypotheses for Validation KO Mouse Infection Survival Death Negative Negative Phenotype: Altered Altered Altered Negative Network:

  25. 25 Predicted targets abrogate influenza pathogenesis H5N1 infection SARS infection • Tnfrsf1b (aka. Tnfr2) • Predicted common regulator for influenza and SARS pathogenesis • Tnfa binding • Negatively regulate TNFR1 signaling, which is proinflammatory • Promote endothelial cell activation/migration • Activation and proliferation of immune cells

  26. 10 5 0 -5

  27. Biological Drivers in Ovarian Cancer • What genomic characteristics of ovarian cancer are executed at the protein level? • Can protein expression be used to identify the most important genomic changes? • How can we improve the survival of women with ovarian cancer? • Can proteomics provide insight into the biological processes associated with poor survival? • Can we use a pathway-based approach to suggest novel therapeutic targets?

  28. Proteomics • Chemoresistance in ovarian and breast cancer • Tumor samples from The Cancer Genome Atlas • Depth of genomic characterization • Many tumors • Proteomics and phosphoproteomics characterization of these tumors • Pathway/network analysis to reveal patterns and biomarkers • Integrate data into single view of the system

  29. Clustering of Proteins and Phosphoproteins Phosphoproteins Proteins iTRAQ Batch Proteomic Subtypes Transcriptomic Subtype Log2 abundance relative to universal reference pool

  30. A Subset of Proteins and Phosphopeptides Correlate with Patient Survival Phosphorylation (normalized to abundance) Protein Abundance Linear regression of abundance versus days-to-death suggests possible correlations with patient survival

  31. PDGFRB Pathway Correlated with short survival Weak correlation Correlated with long survival Weak correlation Not observed mRNA abundance phosphorylation protein abundance

  32. Integrated Co-abundance Network for Ovarian Cancer Module 1 (short survival) Module 2 (long survival) Correlated with short survival Correlated with long survival Protein Phosphorylated protein mRNA

  33. Survival Analysis from Network Targets Kaplan-Meier plots from integrated CNV, mRNA expression, and mutations P-value 0.005 P-value 0.007 % survival % survival ATF3 DUSP1 FOSB ZFP36 IGKV1-5 LAX1 AMPD1 IGHM SLAMF7 Months survival Months survival

  34. Conclusions • Several effective ways of big data integration • Machine learning approaches • Biological network representation • Data integration • Understanding of disease requires system-level views • Relatively simple approaches can yield novel insight • Combining different views of system can improve insight • Data analysis and modeling is a starting point- not an end point

  35. Acknowledgements • SysBEP (http://www.sysbep.org) • NIAID/NIH Y1-AI-8401 • PI: Josh Adkins, PNNL • Systems Virology (http://www.systemsvirology.org) • NIAID/NIH HHSN272200800060C • PI: Michael Katze, UW • Clinical Proteomics Tumor Analysis Consortium • NCI/NIH 1U24CA160019 • PIs: Richard Smith, PNNL; Karin Rodland, PNNL • Many, many people in these and other projects who helped with this work and made it possible

  36. 36 About Me • Email: Jason.McDermott@pnnl.gov • About: http://www.jasonya.com/wp/about/ • Twitter: @BioDataGanache • Blog: The Mad Scientist’s Confectioner’s Club • http://www.jasonya.com/wp/

More Related