1 / 58

Network integration and function prediction: Putting it all together

Network integration and function prediction: Putting it all together. Curtis Huttenhower 04-13-11. Harvard School of Public Health Department of Biostatistics. Outline. Functional network integration Bayes nets and LR The human genome, tissues, and disease Network meta-analysis

ursa
Télécharger la présentation

Network integration and function prediction: Putting it all together

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network integration and function prediction:Putting it all together Curtis Huttenhower 04-13-11 Harvard School of Public Health Department of Biostatistics

  2. Outline • Functional network integration • Bayes nets and LR • The human genome, tissues, and disease • Network meta-analysis • Pathogens and MTb • Quantifying progress in yeast • Networks to pathways • Functional mapping: networks of networks • Hierarchical integration • Pathway prediction • Regulatory network integration • Network motifs

  3. A computational definition offunctional genomics Prior knowledge Genomic data Gene ↓ Function Gene ↓ Gene Data ↓ Function Function ↓ Function

  4. A framework for functional genomics 100Ms gene pairs → ← 1Ks datasets P(G2-G5|Data) = 0.85 Frequency Low Correlation High Correlation = + Frequency Not coloc. Coloc. Frequency Dissim. Similar Low Similarity High Similarity Low Correlation High Correlation

  5. MEFIT: A Framework forFunctional Genomics Functional area Tissue Disease … Functional Relationship Biological Context Golub 1999 Butte 2000 Whitfield 2002 Hansen 1998

  6. Functional networkprediction and analysis Global interaction network HEFalMp Currently includes data from30,000 human experimental results,15,000 expression conditions +15,000 diverse others, analyzed for200 biological functions and150 diseases Metabolism network Signaling network Gut community network

  7. HEFalMp: Predicting human gene function HEFalMp

  8. HEFalMp: Predicting humangenetic interactions HEFalMp

  9. HEFalMp: Analyzing human genomic data HEFalMp

  10. HEFalMp: Understanding human disease HEFalMp

  11. Validating Human Predictions With Erin Haley, Hilary Coller Autophagy 5½ of 7 predictions currently confirmed Predicted novel autophagy proteins Luciferase (Negative control) ATG5 (Positive control) LAMP2 RAB11A Not Starved Starved (Autophagic)

  12. Outline • Functional network integration • Bayes nets and LR • The human genome, tissues, and disease • Network meta-analysis • Pathogens and MTb • Quantifying progress in yeast • Networks to pathways • Functional mapping: networks of networks • Hierarchical integration • Pathway prediction • Regulatory network integration • Network motifs

  13. Meta-analysis for unsupervisedfunctional data integration Huttenhower 2006Hibbs 2007 Evangelou 2007 Simple regression: All datasets are equally accurate Random effects: Variation within and among datasets and interactions

  14. Meta-analysis for unsupervisedfunctional data integration Huttenhower 2006Hibbs 2007 Evangelou 2007 = +

  15. Unsupervised data integration:TB virulence and ESX-1 secretion With Sarah Fortune Graphle http://huttenhower.sph.harvard.edu/graphle/

  16. Unsupervised data integration:TB virulence and ESX-1 secretion With Sarah Fortune X ? Graphle http://huttenhower.sph.harvard.edu/graphle/

  17. Predicting gene function Predicted relationships between genes Low Confidence High Confidence Cell cycle genes

  18. Predicting gene function Predicted relationships between genes Low Confidence High Confidence Cell cycle genes

  19. Predicting gene function Predicted relationships between genes Low Confidence High Confidence These edges provide a measure of how likely a gene is to specifically participate in the process of interest. Cell cycle genes

  20. Comprehensive validation of computational predictions With David Hess, Amy Caudy Genomic data Prior knowledge Computational Predictions of Gene Function SPELL Hibbs et al 2007 bioPIXIE Myers et al 2005 MEFIT Retraining Genes predicted to function in mitochondrion organization and biogenesis New known functions for correctly predicted genes Laboratory Experiments Growth curves Petite frequency Confocal microscopy

  21. Evaluating the performance of computational predictions Genes involved in mitochondrion organization and biogenesis 106 Original GO Annotations 135 Under-annotations 82 Novel Confirmations, First Iteration 17 Novel Confirmations, Second Iteration 340 total: >3x previously known genes in ~5 person-months

  22. Evaluating the performance of computational predictions Genes involved in mitochondrion organization and biogenesis Computational predictions from large collections of genomic data can be accurate despite incomplete or misleading gold standards, and they continue to improve as additional data are incorporated. 106 Original GO Annotations 95 Under-annotations 40 Confirmed Under-annotations 80 Novel Confirmations First Iteration 17 Novel Confirmations Second Iteration 340 total: >3x previously known genes in ~5 person-months

  23. Outline • Functional network integration • Bayes nets and LR • The human genome, tissues, and disease • Network meta-analysis • Pathogens and MTb • Quantifying progress in yeast • Networks to pathways • Functional mapping: networks of networks • Hierarchical integration • Pathway prediction • Regulatory network integration • Network motifs

  24. Functional mapping: mining integrated networks Predicted relationships between genes The strength of these relationships indicates how cohesive a process is. Low Confidence High Confidence Chemotaxis

  25. Functional mapping: mining integrated networks Predicted relationships between genes Low Confidence High Confidence Chemotaxis

  26. Functional mapping: mining integrated networks Predicted relationships between genes The strength of these relationships indicates how associated two processes are. Low Confidence High Confidence Chemotaxis Flagellar assembly

  27. Functional mapping:Associations among processes HydrogenTransport ElectronTransport Edges Associations between processes Cellular Respiration Moderately Strong Very Strong Cell Redox Homeostasis Aldehyde Metabolism Protein Processing Peptide Metabolism Vacuolar Protein Catabolism Negative Regulation of Protein Metabolism Energy Reserve Metabolism Protein Depolymerization Organelle Fusion Organelle Inheritance

  28. Functional mapping:Associations among processes HydrogenTransport ElectronTransport Edges Associations between processes Cellular Respiration Moderately Strong Very Strong Cell Redox Homeostasis Aldehyde Metabolism Protein Processing Peptide Metabolism Vacuolar Protein Catabolism Negative Regulation of Protein Metabolism Energy Reserve Metabolism Borders Data coverage of processes Protein Depolymerization Organelle Fusion Sparsely Covered Well Covered Organelle Inheritance

  29. Functional mapping:Associations among processes HydrogenTransport ElectronTransport Edges Associations between processes Cellular Respiration Moderately Strong Very Strong Cell Redox Homeostasis Aldehyde Metabolism Nodes Cohesiveness of processes Protein Processing Peptide Metabolism Below Baseline Baseline (genomic background) Very Cohesive Vacuolar Protein Catabolism Negative Regulation of Protein Metabolism Energy Reserve Metabolism Borders Data coverage of processes Protein Depolymerization Organelle Fusion Sparsely Covered Well Covered Organelle Inheritance

  30. Functional mapping:Associations among processes Edges Associations between processes Moderately Strong Very Strong Nodes Cohesiveness of processes Below Baseline Baseline (genomic background) Very Cohesive Borders Data coverage of processes Sparsely Covered Well Covered

  31. How do functional interactionsbecome pathways? • Gene expression • Physical PPIs • Genetic interactions • Colocalization • Sequence • Protein domains • Regulatory binding sites • … ? = +

  32. Simultaneous inference of physical, genetic, regulatory, and functional networks With Chris Park, Olga Troyanskaya Functional interactions Regulatory interactions Post-transcriptional regulation Phosphorylation Metabolic interactions Protein complexes Functional genomic data

  33. Learning a compendium of interaction networks Train one SVM per interaction type Resolve consistency using hierarchical Bayes net

  34. Learning a compendium of interaction networks Both presence/absence and directionality of interactions are accurately inferred AUC 0.5 1.0

  35. Using network compendia to predictcomplete pathways With David Hess Additional 20 novel synthetic lethality predictions tested,14 confirmed(>100x better than random) Confirmed Unconfirmed

  36. Interactive aligned network viewer –http://function.princeton.edu/bioweaver Graphle

  37. Outline • Functional network integration • Bayes nets and LR • The human genome, tissues, and disease • Network meta-analysis • Pathogens and MTb • Quantifying progress in yeast • Networks to pathways • Functional mapping: networks of networks • Hierarchical integration • Pathway prediction • Regulatory network integration • Network motifs

  38. Human Regulatory Networks Serum starved (hrs) Serum re-stimulated (hrs) 1 2 4 8 24 96 1 2 4 8 24 48 I II III IV 6,829 genes V VI VII VIII IX X 5< 0 <5 Quiescence: reversible exit from the cell cycle FIRE: Elemento et al. 2007 Elk-1 G0 YY1 Sp1 NF-Y • Of only five regulators found, four have generic cell cycle/proliferation targets • Just five basic regulators for ~7,000 genes? • These motifs only appear upstream of ~half of the genes Cell cycle Cholesterol Development Metabolism Development RNA processing Protein localization

  39. COALESCE: Combinatorial Algorithm forExpression and Sequence-based Cluster Extraction Nucleosome Positions Gene Expression DNA Sequence Evolutionary Conservation Create a new module 3’ UTR 5’ UTR Upstream flank Downstream flank Feature selection: Tests for differential expression/frequency Identify conditions where genescoexpress Identify motifs enriched in genes’ sequences Bayesian integration Select genes based on conditionsand motifs Regulatory modules • Coregulated genes • Conditions where they’recoregulated • Putative regulating motifs Subtract mean from all data

  40. COALESCE: SelectingCoexpressed Conditions • For each gene expression condition… • Compare distributions of values for • Genes in the module versus • Genes not in the module • If significantly different, include the condition • Preserving data structure: • If multiple conditions derive from the same dataset, can be included/excluded as a unit • For example, time course vs. deletion collection • Test using multivariate z-test • Precalculate covariance matrix; still very efficient

  41. COALESCE: SelectingSignificant Motifs • Coalesce looks for three kinds of motifs: • K-mers • Reverse complement pairs • Probabilistic Suffix Trees (PSTs) • For every possible motif… • Compare distributions of values for • Genes in the module versus • Genes not in the module • If significantly different, include the motif ACGACGT ACGACAT | ATGTCGT A C T G G T A C T • This can distinguish flanks from UTRs • Fast! • Efficient enough to search coding sequence (e.g. exons/introns)

  42. COALESCE: SelectingProbable Genes • For each gene in the genome… For each significant condition… For each significant motif… What’s the probability the gene came from the module’s distribution? What’s the probability that it came from outside the module? Prior is used to stabilize module convergence; genes already in the module are more likely to stay there next iteration. The probability of a gene being in the module given some data… Distributions of each feature in and out of the developing module are observed from the data.

  43. COALESCE: IntegratingAdditional Data Types Nucleosome placement Evolutionary conservation • Can be included as additional datasets and feature selected just like expression conditions/motifs. • Or can be used as a prior or weight on the values of individual motifs. TCCGGTAGAACTACTGGTATTGTTTTGGATTCCGGTGATG

  44. COALESCE Results:S. cerevisiae Modules ~2,200 conditions A needle 100 genes 80 conditions The haystack ~6,000 genes

  45. COALESCE Results:S. cerevisiae Modules 54 genes, 144 conditions Conjugation 112 genes, 82 conditions Mitosis and DNA replication 266 1612 Ste12 Stb1/Swi6 33 genes, 434 conditions Budding Swi5 284

  46. COALESCE Results:S. cerevisiae Modules 126 genes, 660 conditions Glycolysis, iron and phosphate transport, amino acid metabolism… Aft1/2 50 genes, 775 conditions Iron transport 174 175 Helix-Loop-Helix Tye7/Cbf1/Pho4 11 genes, 844 conditions Phosphate transport 176 Pho4

  47. COALESCE Results:S. cerevisiae Modules 72 genes, 319 conditions Mitochondrial translation Puf3 822 …plus more ribosome clusters than you can shake a stick at!

  48. COALESCE Results:Yeast TF/Target Accuracy

  49. COALESCE Results:TF/Targets Influenced by Supporting Data Improved only by both Decreased by addl. data Improved by conservation Improved by any addl. data, mainly conservation

  50. COALESCE Results:Yeast Clustering Accuracy • ~2,200 yeast conditions • Recapitulation of known biology from Gene Ontology

More Related