1 / 43

341: Introduction to Bioinformatics

341: Introduction to Bioinformatics. Dr. Nataša Pržulj Department of Computing Imperial College London natasha@imperial.ac.uk Winter 2011. Topics. Introduction: biology Introduction: graph theory Network properties Network/node centralities Network motifs Network models

Télécharger la présentation

341: Introduction to Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 341: Introduction to Bioinformatics Dr. Nataša Pržulj Department of Computing Imperial College London natasha@imperial.ac.uk Winter 2011

  2. Topics • Introduction: biology • Introduction: graph theory • Network properties • Network/node centralities • Network motifs • Network models • Network/node clustering • Network comparison/alignment • Software tools for network analysis • Interplay between topology and biology

  3. Network Properties • Global Network Properties (Chapter 3 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber) They give an overall view of a network: • Degree distribution • Clustering coefficient and spectrum • Average diameter

  4. 1) Degree Distribution G

  5. Research debates… • Degree correlation: • Pearson corr. coefficient between degrees of adjacent vertices • Average neighbor degree; then average over all nodes of degree k • Structural robustness and attack tolerance: • “Robust, yet fragile” • Scale-free degree distribution: • Party vs. date hubs • J.D. Han et al., Nature, 430:88-93, 2004 • Bias in the data construction (sampling)? • M. Stumpf et al., PNAS, 102:4221-4224, 2005 • J. Han et al., Nature Biotechnology, 23:839-844, 2005 • High degree nodes: • Essential genes • H. Jeong at al., Nature 411, 2001. • Disease/cancer genes • Jonsson and Bates, Bioinformatics, 22(18), 2006 • Goh et al., PNAS, 104(21), 2007

  6. 2) Clustering Coefficient and Spectrum • Cv – Clustering coefficient of node v • CA= 1/1 = 1 • CB = 1/3 = 0.33 • CC = 0 • CD = 2/10 = 0.2 • … • C = Avg. clust. coefficient of the whole network • = avg {Cv over all nodes v of G} • C(k) – Avg. clust. coefficient of all nodes • of degree k • E.g.: C(2) = (CA + CC)/2 = (1+0)/2 = 0.5 • => Clustering spectrum • E.g. • (not for G) G Need to evaluate whether the value of C (or any other property) is statistically significant.

  7. 3) Average Diameter u • Distance between a pair of nodes u and v: • Du,v = min {length of all paths between u and v} • = min {3,4,3,2} = 2 = dist(u,v) • Average diameter of the whole network: • D = avg {Du,v for all pairs of nodes {u,v} in G} • Spectrum of the shortest path lengths G v E.g. (not for G)

  8. Network Properties • Global network properties might not be detailed enough to capture complex topological characteristics of large networks

  9. Network Properties 2. Local Network Properties (Chapter 5 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber) They encompass larger number of constraints, thus reducing degrees of freedom in which networks being compared can vary How do we show that two networks are different? How do we show that they are the same? How do we quantify the level of their similarity?

  10. Network Properties 2. Local Network Properties (Chapter 5 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber) Network motifs Graphlets: 2.1) Relative Graphlet Frequency Distance between 2 networks 2.2) Graphlet Degree Distribution Agreement between 2 networks

  11. 1) Network motifs (Uri Alon’s group, ’02-’04) • Small subgraphs that are overrepresented in a network when compared to randomized networks • Network motifs: • Reflect the underlying evolutionary processes that generated the network • Carry functional information • Define superfamilies of networks  - Zi is statistical significance of subgraph i, SPi is a vector of numbers in 0-1 • But: • Functionally important but not statistically significant patterns could be missed • The choice of the appropriate null model is crucial, especially across “families”

  12. 1) Network motifs (Uri Alon’s group, ’02-’04) • Small subgraphs that are overrepresented in a network when compared to randomized networks • Network motifs: • Reflect the underlying evolutionary processes that generated the network • Carry functional information • Define superfamilies of networks  - Zi is statistical significance of subgraph i, SPi is a vector of numbers in 0-1 • But: • Functionally important but not statistically significant patterns could be missed • The choice of the appropriate null model is crucial, especially across “families” • Random graphs with the same in- and out- degree distribution as data might not be the best network null model

  13. 1) Network motifs (Uri Alon’s group, ’02-’04) http://www.weizmann.ac.il/mcb/UriAlon/ Also, see Pajek, MAVisto, and FANMOD

  14. 2) Graphlets (Przulj group, ’04-’10) _____ • Different from network motifs: • Induced subgraphs • Of any frequency N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

  15. N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

  16. N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

  17. 2.1) Relative Graphlet Frequency (RGF) distance between networks G and H: N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

  18. 2.2) Graphlet Degree Distributions Generalize node degree

  19. N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

  20. N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

  21. Network structure vs. biological function & disease Graphlet Degree (GD) vectors, or “node signatures” T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

  22. Similarity measure between “node signature” vectors T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

  23. Signature Similarity Measure between nodes u and v T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

  24. T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

  25. SMD1 YBR095C 40% PMA1 T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

  26. T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

  27. 90%* SMD1 RPO26 SMB1 *Statistically significant threshold at ~85% T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

  28. Later we will see how to use this and other techniques to link network structure with biological function

  29. Generalize Degree Distribution of a network • The degree distribution measures: • the number of nodes “touching” k edges for each value of k N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.

  30. N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.

  31. N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.

  32. / sqrt(2) ( to make it between 0 and 1) This is called Graphlet Degree Distribution (GDD) Agreement between networks G and H.

  33. Software that implements many of these network properties and compares networks with respect to them: GraphCrunch http://www.ics.uci.edu/~bio-nets/graphcrunch/

  34. Network properties 3. Network/node centralities (Chapter 4 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber) • Rank nodes according to their “topological importance”

  35. 3) Network/node centralities 7 8 1 2 3 4 5 6 9 10 If nodes are housing communities, where to build a hospital?

  36. 3) Network/node centralities 1 2 3 4 5 6 If nodes are housing communities, where to build a hospital?

  37. Network properties 3. Network/node centralities • Different centrality measures exist • Centrality values comparable inside a given network only • Centrality values of two centrality measures incomparable even within the same network • Some centrality measures can be applied to connected networks only

  38. 3) Network/node centralities • Degree centrality • Closeness centrality • Eccentricity centrality • Betweenness centrality • Other centrality measures exist, e.g.: • Eigenvector centrality • Subgraph centrality • … • Software tools: Visone (social nets) and CentiBiN (biological nets)

  39. 3) Network/node centralities • Degree centrality: • Nodes with high degrees have high centrality Cd(v)=deg(v) • Closeness centrality: • Nodes with short paths to all other nodes have high centrality

  40. 3) Network/node centralities • Essentricity centrality: • Nodes with short paths to any other node have high centrality • Betweenness centrality: • Nodes (or edges) that occur in many of the shortest paths have high centrality

  41. Topics • Introduction: biology • Introduction: graph theory • Network properties • Network/node centralities • Network motifs • Network models • Network/node clustering • Network comparison/alignment • Software tools for network analysis • Interplay between topology and biology

More Related