interMolecular Interactions & Network Inference Unit 17

interMolecular Interactions & Network InferenceUnit 17 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD

From Previous Lectures • Name a pattern-driven algorithm for promoter prediction • Name a database for pattern-driven algorithm for promoter prediction TRANSFAC, PROMO • Viterbi – HMM algorithm – sequence driven • Dragon Promoter Finder (DPF) – sequence driven, based on promoter recognition models • NNPP2.1, Promoter2.0, PromoterInspector

B&O: Chapter 10 • Intermolecular Interactions and Biological Pathways • Databases • Algorithms • Resources • Visualization Tools • Integration with current high-throughput technologies

Introduction Nothing can happen in biology unless something binds to something else… Major challenge – gain an understanding of the workings of the cell by integrating available information from the various fields of molecular and cellular biology into an accurate model to generate hypotheses for testing

from Cary, Bader, Sander, FEBS Letters 579 (2005) 1815-20

Technologies to map pathways Sequencing, Microarrays, Proteomics, Chromatography, Gel-based separation techniques, NMR, <pharmaco>kinetics (enzymology – characterizing rates of reactions), etc

The genomic era Human genome sequence “completed”, Feb 2001

Novel technologies  lots of data • http://www.genome.gov/25522225 • http://www.genome.gov/15015202 • http://www.illumina.com/pages.ilmn?ID=203 • Sample currently funded projects: • Differential metabolic network analysis of tumor progression • Purdue - TOOLS FOR DIFFERENTIAL METABOLOMICS

Genomics  Pathways • Massive amounts of genomic data are being obtained through: • Sequencing (Pyro-, nanopore-, by synthesis, by hybridization) • Microarray technology • Algorithmic treatment of this data: • Gene classification and clustering • Genetic networks ------------More on Apr 2

Proteomics  Pathways • Massive amounts of proteomic data are being obtained through: • 2D gel electrophoresis • Mass spectrometry • Algorithmic treatment of this data: • Image analysis of 2D gels • Peptide sequencing and identification • Pathways? • ------------More on Apr 9, 16

Protein-protein interaction data • Physical Interactions • Yeast two hybrid screens • Affinity purification (mass spec) • Peptide arrays • Protein-DNA by chIP-chip • Other measures of ‘association’ • Genetic interactions (double deletion mutants) • Genomic context (STRING)

Yeast two-hybrid method Y2H assays interactions in vivo. Uses property that transcription factors generally have separable transcriptional activation (AD) and DNA binding (DBD) domains. A functional transcription factor can be created if a separately expressed AD can be made to interact with a DBD. A protein ‘bait’ B is fused to a DBD and screened against a library of protein “preys”, each fused to a AD.

Mol. Dynamics – April 16,23 • Molecular dynamics calculations are computer intensive – for each time step (sub – ps) you need to do several calculations for each atom in the molecule. • For a reasonable protein (several 100 amino acids – or thousands of atoms) it takes many hours of supercomputing to map out the motions for nanoseconds • Fast laser dynamic experiments are just starting to actually measure time courses of individual molecule motion in picoseconds

Biochemical Information Analogy genome sequencing  words and protein identification spelling gene and protein word function meaning biochemicagrammar and pathway information conjugation Trying to understand life without knowledge of biochemical pathways would be like trying to understand Shakespeare without knowledge of English grammar.

Definition of biochemical pathways: A series of related biochemical reactions. A pathway is a network of interacting molecules, representing the accumulated knowledge of various aspects of cellular processes, such as metabolic pathways, cell cycle pathways, developmental pathways and many other regulatory pathways. The backbone of all biochemical pathways are proteins. The pathway is defined by the way that proteins interact with other molecules.

Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002

Is all of this information sufficient to completely define the process of life? Not even close.

Building models from parts lists

Continuum of modeling approaches Top-down Bottom-up

Data integration and statistical mining Need computational tools able to distill pathways of interest from large molecular interaction databases (top-down)

Types of information to integrate • Data that determine the network (nodes and edges) • protein-protein • protein-DNA, etc… • Data that determine the state of the system • mRNA expression data • Protein modifications • Protein levels • Growth phenotype • Dynamics over time

Post-genomic view of protein function

The two major reasons for computing biochemical pathways: • Storage of current knowledge on all biochemical network of all organisms in a reference database. • Prediction of networks given a set of biomolecules (Proteins, genes, etc…)

How to efficiently represent complex cellular pathways information in a computer ?

The layout of protein-protein interaction map of the yeast.

KEGG: Kyoto Encyclopedia of Genes and Genomes “The primary objective of KEGG is to computerize the current knowledge of molecular interactions; namely, metabolic pathways, regulatory pathways, and molecular assemblies.” www.genome.jp/kegg/ From http://www.tokyo-center.genome.ad.jp/kegg/docs/intro.html

Data organization in KEGG

Connection between KEGG and other databases

Levels of Abstraction It is clear that database storage of biochemical information requires us to consider the different levels of abstraction that occur in the data.

Network representation and computation.Level of abstraction: Flow of genetic information through different levels of abstraction: • Central dogma of molecular biology:(Abstraction at the sequence level) DNA  RNA  Protein • Thermodynamics principle: (Abstraction at the level of protein function) Sequence  Structure  Simple Function • Analysis of biological function: (Abstraction at the level of biochemical pathways) Interaction  Network  Complex Functionality

Graph representation of a network

Algorithms for biochemical pathways • The natural representation for biochemical pathways and networks is as a graph • with information represented as graphs, we can apply the powerful algorithms of graph theory to the problem

Network Representation regulates regulatory interactions (protein-DNA) gene B gene A binds functional complex B is a substrate of A (protein-protein) gene B gene A reaction product is a substrate for metabolic pathways gene B gene A

Representation of Metabolic Reactions

Graphs • Graph G=(V,E) is a set of vertices V and edges E • A subgraphG’ of G is induced by some V’V and E’ E • Graph properties: • Connectivity (node degree, paths) • Cyclic vs. acyclic • Directed vs. undirected

Network Measures • Degree ki • Degree distribution P(k) • Mean path length • Network Diameter • Clustering Coefficient

Definition of graph isomorphism Let us consider two graphs: G1=(V1,E1) and G2 =(V2,E2) They are said to be isomorphic when they have the same number of vertices, and there is way to relabel the vertices from one graph to the other graph so that the edges in both graphs become identical. h1 h3 v3 v1 h4 v2 v4 h2

Paths A path is a sequence {x1, x2,…, xn} such that (x1,x2), (x2,x3), …, (xn-1,xn) are edges of the graph. A closed path xn=x1 on a graph is called a graph cycle or circuit.

Shortest-Path between nodes

Longest Shortest-Path

Small-world Network • Every node can be reached from every other by a small number of hops or steps • High clustering coefficient and low mean-shortest path length • Random graphs don’t necessarily have high clustering coefficients • Social networks, the Internet, and biological networks all exhibit small-world network characteristics

Path computation algorithms: • Transitive closure - Warshall’s algorithm- For every pair of nodes of the graph: - If node vi is connected to node vj - If node vj is connected to node vk - Then connect node v1 to node vk • Shortest paths for all pairs of nodes - Floyd’s algorithm- For every pair of nodes of the graph: - If node vi is connected to node vj - Look for any nodes between vi and vj - Compute the distance between vi and vj

interMolecular Interactions & Network Inference Unit 17