1 / 59

interMolecular Interactions & Network Inference Unit 17

interMolecular Interactions & Network Inference Unit 17. BIOL221T : Advanced Bioinformatics for Biotechnology. Irene Gabashvili, PhD. From Previous Lectures. Name a pattern-driven algorithm for promoter prediction

leroy
Télécharger la présentation

interMolecular Interactions & Network Inference Unit 17

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. interMolecular Interactions & Network InferenceUnit 17 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD

  2. From Previous Lectures • Name a pattern-driven algorithm for promoter prediction • Name a database for pattern-driven algorithm for promoter prediction TRANSFAC, PROMO • Viterbi – HMM algorithm – sequence driven • Dragon Promoter Finder (DPF) – sequence driven, based on promoter recognition models • NNPP2.1, Promoter2.0, PromoterInspector

  3. B&O: Chapter 10 • Intermolecular Interactions and Biological Pathways • Databases • Algorithms • Resources • Visualization Tools • Integration with current high-throughput technologies

  4. Introduction Nothing can happen in biology unless something binds to something else… Major challenge – gain an understanding of the workings of the cell by integrating available information from the various fields of molecular and cellular biology into an accurate model to generate hypotheses for testing

  5. from Cary, Bader, Sander, FEBS Letters 579 (2005) 1815-20

  6. Technologies to map pathways Sequencing, Microarrays, Proteomics, Chromatography, Gel-based separation techniques, NMR, <pharmaco>kinetics (enzymology – characterizing rates of reactions), etc

  7. The genomic era Human genome sequence “completed”, Feb 2001

  8. Novel technologies  lots of data • http://www.genome.gov/25522225 • http://www.genome.gov/15015202 • http://www.illumina.com/pages.ilmn?ID=203 • Sample currently funded projects: • Differential metabolic network analysis of tumor progression • Purdue - TOOLS FOR DIFFERENTIAL METABOLOMICS

  9. Genomics  Pathways • Massive amounts of genomic data are being obtained through: • Sequencing (Pyro-, nanopore-, by synthesis, by hybridization) • Microarray technology • Algorithmic treatment of this data: • Gene classification and clustering • Genetic networks ------------More on Apr 2

  10. Proteomics  Pathways • Massive amounts of proteomic data are being obtained through: • 2D gel electrophoresis • Mass spectrometry • Algorithmic treatment of this data: • Image analysis of 2D gels • Peptide sequencing and identification • Pathways? • ------------More on Apr 9, 16

  11. Protein-protein interaction data • Physical Interactions • Yeast two hybrid screens • Affinity purification (mass spec) • Peptide arrays • Protein-DNA by chIP-chip • Other measures of ‘association’ • Genetic interactions (double deletion mutants) • Genomic context (STRING)

  12. Yeast two-hybrid method Y2H assays interactions in vivo. Uses property that transcription factors generally have separable transcriptional activation (AD) and DNA binding (DBD) domains. A functional transcription factor can be created if a separately expressed AD can be made to interact with a DBD. A protein ‘bait’ B is fused to a DBD and screened against a library of protein “preys”, each fused to a AD.

  13. Mol. Dynamics – April 16,23 • Molecular dynamics calculations are computer intensive – for each time step (sub – ps) you need to do several calculations for each atom in the molecule. • For a reasonable protein (several 100 amino acids – or thousands of atoms) it takes many hours of supercomputing to map out the motions for nanoseconds • Fast laser dynamic experiments are just starting to actually measure time courses of individual molecule motion in picoseconds

  14. Biochemical Information Analogy genome sequencing  words and protein identification spelling gene and protein word function meaning biochemicagrammar and pathway information conjugation Trying to understand life without knowledge of biochemical pathways would be like trying to understand Shakespeare without knowledge of English grammar.

  15. Definition of biochemical pathways: A series of related biochemical reactions. A pathway is a network of interacting molecules, representing the accumulated knowledge of various aspects of cellular processes, such as metabolic pathways, cell cycle pathways, developmental pathways and many other regulatory pathways. The backbone of all biochemical pathways are proteins. The pathway is defined by the way that proteins interact with other molecules.

  16. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002

  17. Is all of this information sufficient to completely define the process of life? Not even close.

  18. Building models from parts lists

  19. Continuum of modeling approaches Top-down Bottom-up

  20. Data integration and statistical mining Need computational tools able to distill pathways of interest from large molecular interaction databases (top-down)

  21. Types of information to integrate • Data that determine the network (nodes and edges) • protein-protein • protein-DNA, etc… • Data that determine the state of the system • mRNA expression data • Protein modifications • Protein levels • Growth phenotype • Dynamics over time

  22. Post-genomic view of protein function

  23. The two major reasons for computing biochemical pathways: • Storage of current knowledge on all biochemical network of all organisms in a reference database. • Prediction of networks given a set of biomolecules (Proteins, genes, etc…)

  24. How to efficiently represent complex cellular pathways information in a computer ?

  25. The layout of protein-protein interaction map of the yeast.

  26. KEGG: Kyoto Encyclopedia of Genes and Genomes “The primary objective of KEGG is to computerize the current knowledge of molecular interactions; namely, metabolic pathways, regulatory pathways, and molecular assemblies.” www.genome.jp/kegg/ From http://www.tokyo-center.genome.ad.jp/kegg/docs/intro.html

  27. Data organization in KEGG

  28. Connection between KEGG and other databases

  29. Levels of Abstraction It is clear that database storage of biochemical information requires us to consider the different levels of abstraction that occur in the data.

  30. Network representation and computation.Level of abstraction: Flow of genetic information through different levels of abstraction: • Central dogma of molecular biology:(Abstraction at the sequence level) DNA  RNA  Protein • Thermodynamics principle: (Abstraction at the level of protein function) Sequence  Structure  Simple Function • Analysis of biological function: (Abstraction at the level of biochemical pathways) Interaction  Network  Complex Functionality

  31. Graph representation of a network

  32. Algorithms for biochemical pathways • The natural representation for biochemical pathways and networks is as a graph • with information represented as graphs, we can apply the powerful algorithms of graph theory to the problem

  33. Network Representation regulates regulatory interactions (protein-DNA) gene B gene A binds functional complex B is a substrate of A (protein-protein) gene B gene A reaction product is a substrate for metabolic pathways gene B gene A

  34. Representation of Metabolic Reactions

  35. Graphs • Graph G=(V,E) is a set of vertices V and edges E • A subgraphG’ of G is induced by some V’V and E’ E • Graph properties: • Connectivity (node degree, paths) • Cyclic vs. acyclic • Directed vs. undirected

  36. Network Measures • Degree ki • Degree distribution P(k) • Mean path length • Network Diameter • Clustering Coefficient

  37. Definition of graph isomorphism Let us consider two graphs: G1=(V1,E1) and G2 =(V2,E2) They are said to be isomorphic when they have the same number of vertices, and there is way to relabel the vertices from one graph to the other graph so that the edges in both graphs become identical. h1 h3 v3 v1 h4 v2 v4 h2

  38. Paths A path is a sequence {x1, x2,…, xn} such that (x1,x2), (x2,x3), …, (xn-1,xn) are edges of the graph. A closed path xn=x1 on a graph is called a graph cycle or circuit.

  39. Shortest-Path between nodes

  40. Shortest-Path between nodes

  41. Longest Shortest-Path

  42. Small-world Network • Every node can be reached from every other by a small number of hops or steps • High clustering coefficient and low mean-shortest path length • Random graphs don’t necessarily have high clustering coefficients • Social networks, the Internet, and biological networks all exhibit small-world network characteristics

  43. Path computation algorithms: • Transitive closure - Warshall’s algorithm- For every pair of nodes of the graph: - If node vi is connected to node vj - If node vj is connected to node vk - Then connect node v1 to node vk • Shortest paths for all pairs of nodes - Floyd’s algorithm- For every pair of nodes of the graph: - If node vi is connected to node vj - Look for any nodes between vi and vj - Compute the distance between vi and vj

More Related