1 / 55

Networks of Protein Interactions Construction of Networks from Diverse Data Sources

Networks of Protein Interactions Construction of Networks from Diverse Data Sources. Neda Nategh CS 374 Lecture 16 November 7, 2006. What we have learned about interaction networks in CS374. Properties of interaction networks (Susan) Comparison of networks across species

andrew
Télécharger la présentation

Networks of Protein Interactions Construction of Networks from Diverse Data Sources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Networks of Protein InteractionsConstruction of Networks from Diverse Data Sources Neda Nategh CS 374 Lecture 16 November 7, 2006

  2. What we have learned about interaction networks in CS374 • Properties of interaction networks (Susan) • Comparison of networks across species (Chuan Sheng) Network alignment • Construction of networks from diverse data sources (Neda) Network integration

  3. Basics of protein Interaction Networks Biological aspects

  4. Types of interactions • Physical interactions Protein pairs are in direct contact • Complex interaction Protein pairs participate in the same functional module • Metabolic pathway • Signaling network • Multiprotein complex Eukaryote-like glycosylation system of Campylobacter jejuni Cell division machinery of Caulobacter crescentus

  5. Protein Complex A protein complex is a group of two or more associated proteins. Networks of proteins • Topological properties • Functional organization news.uns.purdue.edu/UNS/images/cramer.photo2.jpeg

  6. Metabolic Pathway • Metabolic pathway is a series of chemical reactions occurring within a cell catalyzed by enzymes formation of a metabolic product initiation of another metabolic pathway • Metabolic Networks http://en.wikipedia.org/wiki/Metabolic_pathway

  7. Signaling network Signal transduction Process by which a cell converts one kind of signal or stimulus into another. A sequence of biochemical reactions inside the cell, which are carried out by enzymes and linked through second messengers. http://en.wikipedia.org/wiki/Signal_transduction

  8. High-throughput data • Co-expression • Co-location • Co-inheritance • Co-evolution • Co-citation • Rosetta stone(Gene-fusions)

  9. Expression Gene expression, or simply expression, is the process by which a gene's DNA sequence is converted into the structures and functions of a cell. Indirectly, the expression of particular genes may be assessed with DNA microarray technology, which can provide a rough measure of the cellular concentration of different messenger RNAs; http://en.wikipedia.org/wiki/DNA_microarray

  10. Inheritance Proteins are clustered according to the similarity of their phylogenetic profiles. Similar profiles show a correlated pattern of inheritance and, by implication, functional linkage.

  11. Evolution Evolution is the change in the heritable traits of a population over successive generations, as determined by shifts in the allele frequencies of genes.

  12. Gene fusion • A fusion gene is a hybrid gene formed from two previously separate genes. • translocation • interstitial deletion • chromosomal inversion • By creating a fusion gene of a protein of interest and green fluorescent protein, the protein of interest may be observed in cells or tissue using fluorescence microscopy. The protein synthesized when a fusion gene is expressed is called a fusion protein. http://en.wikipedia.org/wiki/Gene_fusion

  13. Experiments • Microarray analysis of gene expression • Systematic protein interaction mapping • Mass spectrometry • Yeast two hybrid • Synthetic lethal screens

  14. Microarray analysis of gene expression DNA microarray or gene/genome chip, DNA chip, or gene array Collection of microscopic DNA spots attached to a solid surface, such as glass, plastic or silicon chip forming an array for the purpose of expression profiling, monitoring expression levels for thousands of genes simultaneously. Applications: • Identification of sequence • Determination of expression level of genes http://phys.chem.ntnu.no/~bka/images/MicroArrays.jpg

  15. Affinity purification/Mass spectrometry • For characterization of proteins • Using quantitative mass spectrometry to analyze the composition of a partially purified protein complex. • Interacting proteins can be distinguished from nonspecifically co-purifying proteins by their abundance ratios. • Complexes can be analyzed after a single step purification Better detection of weakly associated proteins http://en.wikipedia.org/wiki/Image:Mass_spectrom.gif

  16. Yeast Two Hybrid Two-hybrid screening is a molecular biology technique used to discover protein-protein interactions by testing for physical interactions (such as binding) between two proteins. Susan Tang’s presentation, CS374 algorithms in biology, Stanford University

  17. Synthetic lethal screening • To interpret genetic networks by examining the effects on the cell when pairs of genes are knocked out simultaneously. • Knocking out each gene separately may have no phenotypic effect because of robustness provided by genetic redundancy, • but knocking out both genes has a severe, possibly lethal effect.

  18. Basics of protein Interaction Networks Computational aspects

  19. Statistics terminology • Probability • Probability density • Conditional probability • Prior/Postrior probability • Bayes’ rule

  20. Statistics terminology

  21. Graph theory We map interaction networks to graphs Vertex (node) Cycle Edge -5 Directed Edge (Arc) Weighted Edge 10 7

  22. Networks in our model • Undirected graphs • Nodes correspond to proteins • Edges represent the interactions • Edge weights represent interaction probabilities

  23. Network Clustering 7000 Yeast interactions among 3000 proteins

  24. Training sets • KEGG(Kyoto Encyclopedia of Genes and Genomes) • GFP(Green Fluorescent Protein) • GO(Gene Ontology) • COG(Cluster of Orthologous Groups of proteins)

  25. Genomics • Genomics • 1 genome • Assembly, Gene Finding • Comparative Genomics • N genomes • Sequence Alignment • Functional Genomics • 1 assay • Microarray Analysis • Integrative Genomics • N assays • Network Integration

  26. A probabilistic functional network of yeast genes Insuk Lee, Shailesh V. Date, Alex T. Adai,Edward M. Marcotte

  27. Motivation Knowledge of gene networks’ structure • Complex roles of individual genes interplay between many systems in a cell

  28. Problem Heterogeneous functional genomics data • Microarray analyses of gene expression • Systematic protein interaction mapping measure different aspects of gene or protein associations • Mass spectrometry measure the tendency for proteins to be components of the same physical module • Yeast two-hybrid assays indicate direct physical interaction(stable or transient) between proteins • Synthetic lethal screens measure the tendency for genes to compensate for the loss of other genes

  29. Idea of integration Constructing a more accurate and extensive gene network Considering functional rather than physical associations • genetic • biochemical • computational probabilistic gene-gene linkages Single coherent network

  30. Scoring scheme • Based on a Bayesian statistics approach • Log-likelihood score • Frequencies of linkages (L) observed in the given experiment (E) between annotated genes operating in • the same pathway is P(L|E) • different pathways is ~P(L|E) Total frequency of linkages between all annotated yeast genes operating in • the same pathway is P(L) • different pathways is ~P(L)

  31. Scoring scheme • LLS > 0 Experiments tend to link genes in the same pathway • Higher scores More confident linkages • proportional to the accuracy of the experiments • Different experiments’ scores are directly comparable

  32. Data sources

  33. Benchmarked accuracy and extent of functional genomics data sets and the integrated networks

  34. Results • Evidence from diverse sources • Estimating the functional coupling between yeast genes • A view of relations between yeast proteins distinct from their physical interactions Probabilistic gene network

  35. Future directions Application of this strategy to other organisms such as human • (i) assemble benchmarks for measuring the accuracy of linkages between human genes • (ii) assemble gold standard sets of highly accurate interactions for calibrating the benchmarks • (iii) benchmark functional genomics data for their ability to correctly link human genes • then integrate the data as described.

  36. Integrated protein interaction networks for 11 Microbes Balaji S. Srinivasan, Antal F. Novak, Jason A. Flannick, Serafim Batzoglou, Harley H. McAdams

  37. Motivation There are different methods to predict the interactions but the network generated by eah method are often contradictory Objective: constructing a summary network for each species which uses all the evidence at hand to predict which proteins are functionally linked

  38. Pearson Correlation Arrays Gene C Gene B Gene A .8 1 .8 - .7 Genes Gene A = 1 - .6 Gene B -.7 -.6 1 Gene C Microarray data Data sourceCo-expression Expression Balaji S. Srinivasan

  39. Average chromosomal distance Location Chrom 3 Chrom 1 Chrom 2 Chrom 4 Protein C Protein B Protein A .06 .4 .2 .3 .1 Protein A 0 .06 .25 Protein A = Protein B .5 .25 .25 .05 0 .25 Protein B .25 .1 .3 .2 .6 Protein C .25 0 Protein C Assembled Genomes Data sourcesCo-location Balaji S. Srinivasan

  40. Tree Distances Evolution Prt Fam C Prt Fam B Prt Fam A .9 A A’ A’’ A’’’ 1 .9 -.8 Prt Fam A = 1 -.7 Prt Fam B -.8 -.7 1 Prt Fam C B C B’ B’’ B’’’ C’ C’’ C’’’ Multiple Alignments Data sourcesCo-evolution Balaji S. Srinivasan

  41. Spearman Correlation Inheritance Species 3 Species 1 Species 2 Species 4 Protein C Protein B Protein A .95 400 200 300 100 Protein A 1 .95 - 1 Protein A = Protein B 500 250 250 50 1 - .95 Protein B -1 100 300 200 600 Protein C -.95 1 Protein C BLAST bit scores Data sourcesCo-inheritance Balaji S. Srinivasan

  42. Integration of two predictors • Previous work • Recent work • Method presented in this paper

  43. = coexpression coinheritance Previous work We can integrate two given networks by • intersection • union • average +

  44. Recent work Decision Trees (Wong 2004) Bayesian Networks (Troyanskaya 2003) Likelihood Ratios (Lee 2004) Naïve Bayes + Boosting (Lu 2005)

  45. Training sets • COG • GO • KEGG From up COG to GO to KEGG • Fraction of annotated proteins in a given organism decreases • Annotation quality is increases

  46. Bayes’ Rule: Calculate conditional probability of linkage given evidence 1D Bayes’ rule Balaji S. Srinivasan

  47. B A L=1 B A Same Function P(L|E) B A L=? E known L=0 Different Function ID Bayes’ rule Bayes error rate= min. error rate of classifier Balaji S. Srinivasan

  48. 2D network integration • 2D scatter plot • Separates linked pairs from unlinked pairs more efficiently • co-expression vs. co-inheritence

  49. 2D network integration • Estimate densities • Kernel density estimation • Gray-Moore dual tree algorithm

  50. 2D network integration

More Related