Network motifs: discovery and applications

Network motifs: discovery and applications Guy Zinman Seminar in Bioinformatics Technion, Spring 2005

Outline • Theory of network motifs • Definition, Algorithm • Application to E. Coli transcription network • The dynamic behavior of the motifs • Finding active subnetworks • Simulated annealing • experiments

Network

Network • Dictionary definition: • A group or system of (electric) components and connecting circuitry designed to function in a specific manner. • Network is the backbone of a complex system • Studies of networks are similar to paleontology: learning about an animal from its backbone

Network motifs • The notion of motif, widely used for sequence analysis, is generalized to the level of networks. • Network Motifs are defined as patterns of interconnections that recur in many different parts of a network at frequencies much higher than those found in randomized networks.

Network motifs (cont.) Such motifs are found in networks from: • Biochemistry • Transcriptional regulation networks • Neurobiology • Neuron connectivity • Ecology • Food webs • Engineering • Electoronic circuits • World Wide Web

Network motifs (cont.)

Schematic view of motif detection • Occurrence of the FFL motif:

Random vs designed/evolved features • Large networks may contain information about design principles and/or evolutionof the complex system • Which features are there for a reason: • design principles (e.g. feed-forward loops) • constraints (e.g. the all nodes on the Internet must be connected to each other) • evolution, growth dynamics (e.g. network growth is mainly due to gene duplication)

Network motifs • Alon U. et al: “Network Motifs: Simple building Blocks of Complex Networks”; Science, 2002. • Different motifs were found in different classes of network. • The motif reflect the underlying processes that generate each type of network.

Motifs detected • Two significant motifs: Both appeared numerous times in non-homologous gene systems that perform diverse biological functions

Motifs detected

Main tasks for detecting network motifs There are two main tasks in detecting network motifs: (1) generating an ensemble of proper random networks (2) counting the subgraphs in the real network and in random networks.

The algorithm • Starting point: graph with directed edges • Scan for n-node subgraphs (n=3,4) and count number of occurrences • Compare to Erdos-Renyi randomized graph • (randomization preserves in-, out- and in+out- degree of each node)

All 3-node connected subgraphs • 13 different isomorphic types of 3-node connected subgraph • There are: 199 4-node subgraphs, 9364 5-node subgraphs ……

Generation of randomized network • Algorithm A • Employ a Markov-chain algorithm based on starting with the real network and repeatedly swapping randomly chosen pairs of connections (X1 => Y1, X2 => Y2 is replaced by X1 => Y2, X2 => Y1) until the network is well randomized. • Switching is prohibited if the either of the connections X1 => Y2 or X2 => Y1 already exist.

Generation of randomized network • Algorithm B • Each network was presented as a connectivity matrix M, such that Mij = 1 if there is a connection directed from node i to node j, and 0 otherwise. • The goal is to create a randomized connectivity matrix Mrand, which has the same number of nonzero elements in each row and column as the corresponding row and column of the real connectivity matrix.

Generation of randomized network • Ri = ∑jMrand,ij = ∑jMij, Ci = ∑iMrand,ij = ∑iMij. • To generate the randomized networks, we start with an empty matrix Mrand. • We then repeatedly randomly choose a row n according to the weights pi = Ri/∑Ri and a column m according to the weights qj = Rj/∑Rj. • If Mrand,nm = 0, we set Mrand,mn = 1. • We then set Rm = Rm – 1 and Cn = Cn – 1. If the entry (m, n) was previously entered to the randomized matrix, that is, ifMrand,mn = 1, or if m = n, we choose a new (m, n). • This process is repeated until all Ri = 0 and Cj = 0.

Network motif detection • For each nonzero element (i,j): Looping through all connected elements Mik = 1, Mki = 1, Mjk = 1, and Mkj = 1. This is recursively repeated with elements (i, k), (k, i), (j,k), and (k, j) until an n-node subgraph is obtained. • A table is formed that counts the number of appearances of each type of subgraph in the network, correcting for the fact that multiple submatrices of M can correspond to one isomorphic architecture owing to symmetries.

Network motif detection • This process is repeated for each of the randomized networks. The number of appearances of each type of subgraph in the random ensemble is recorded, to assess its statistical significance. • The present concepts and algorithms are easily generalized to nondirected or directed graphs with several “colors” of edges and nodes, multipartite graphs, and so forth.

Criteria for Network Motif Selection • The probability that it appears in a randomized network an equal or greater number of times than in the real network is smaller than P = 0.01. • Reminder: • p-value: the probability to get the given resultwhen the tested subjectis not affected by the experiment. • if p-value < 0.01 than the subject is considered to be affected (the hypothesis is correct).

Run time complexity • The performance of this algorithm scales with the total number of n-node subgraphs in the network. • The number of subgraphs and the algorithm runtime also increase dramatically for subgraphs with n ≥ 5.

Sampling method for subgraph counting • Kashtan et al.: “Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs”; Bioinformatics, 2004. • This algorithm samples subgraphs in order to estimate their relative frequency. • The runtime of the algorithm asymptotically does not depend on the network size. • Surprisingly, few samples are needed to detect network motifs reliably.

Subgraph sampling Procedure description: • pick a random edge from the network and then expand the subgraph iteratively by picking random neighboring edges until the subgraph reaches n nodes. • For each random choice of an edge, in order to pick an edge that will expand the subgraph size by one, prepare a list of all such candidate edges and then randomly choose an edge from the list.

Subgraph sampling • Finally, the sampled subgraph is defined by the set of n nodes and all the edges that connect between these nodes in the original network. • Finding n-node subgraphs for n ≥5 is much easier now….

Comparing sampling method results with exhaustive enumeration

Transcriptional Regulation Network ofEscherichia coli • Operon – a group of contiguous genes that are transcribed into a single mRNA molecule. • The transcriptional network is represented as a directed graph: each operon represents a node and edges represent direct transcriptional interactions.

Application to E. Coli Alon U.: “Network motifs in the transcriptional regulation network of Eschersichia coli”; Nature Genetics, 2002. • Database - RegulonDB contains interactions between Transcription Factors and the operons they regulate • Contains 577 interactions, 424 operons and 116 TFs • 35 more TFs were added from literature • Previously described algorithm was run on this data (1000 random networks)

Significant motifs Feedforward loop found in 22 different systems, 10 TFs and 40 operons P-Val=0.001

Concentration of FFL

Same in the yeast regulatory network • Young et. al: Transcriptional Regulatory Networks in Saccharomyces cerevisiae; Science, 2002

Can you think of a possible role for this motif?

Dynamics for the FFL

Mangan et al., “Structure and function of the feed-forward loop”; PNAS, 2003. Consider Sx and Sy as Input signal – small molecules That activate or inhibit the Activity of X and Y.

Coherency of FFLs • The FFL is ‘coherent’ if the direct effect of the general TF on the effector has the same sign. • 85% of the FFL found were coherent.

Significant motif Single Input Motif (SIM) • Single Transcription Factor controls set of operons. • All operons in a SIM are regulated with the same sign. • Appeared in 24 different systems

Dynamics for the SIM

Significant motif Dense Overlapping Regulon (DOR) - a layer of overlapping interactions between operons and a group of TFs, much denser than this structure would appear in an Erdos-Renyi random graph

E. Coli network

Dor detection Briefly… • Define a (nonmetric) distance measure between operon k and j. • The operons were clustered. • DORs corresponded to clusters with more than C=10 connections, with ratio of connections to TF greater than R=2.

mFinder • A software tool for estimating subgraph concentrations and detecting network motifs. • www.weizmann.ac.il/mcb/UriAlon/

Discussion • The concept of homology between genes based on sequence motifs has been crucial for understanding the function of uncharacterized genes. • Likewise, the notion of similarity between connectivity patterns in networks, based on network motifs, may be helpful in gaining insight into the dynamic behavior of newly identified gene circuits.

Discussion • Until now we considered only transcription interactions specifically manifested by transcription factors that bind regulatory sites. • This transcriptional network can be thought of as ‘slow’ part of the cellular regulation network (time scale of minutes).

Discussion • An additional layer of faster interactions, which include interaction between proteins (often subsecond timescale), contributes to the full regulatory behavior.

Finding active subnetworks • Ideker, T.: “Discovering regulatory and signaling circuits in molecular interaction networks”; Bioinformatics, 2002. • Integrates protein-protein and protein-DNA interactions with mRNA expression data, in a goal of better understanding the molecular mechanism of the observed gene expression. • Uses a method of searching the network to find ‘active subnetwork’, i.e., connected sets of genes with unexpectedly high levels of differential expression, under one or more perturbation.

Methodology • Using a molecular interaction network to analyze changes in expression over 20 perturbations to the yeast galactose utilization (GAL) pathway. • Determining which conditions significantly affected the gene expression in each active subnetwork.

The means • Combining a rigorous statistical measure for scoring subnetworks with a search algorithm for identifying subnetworks with high score.

Basic z-score calculation • To rate the biological activity of a particular subnetwork, begin with assessing the significance of differential expression for each gene. • The error model provided by VERA (Variability and ERror Assessment) program. • VERA estimates the parameters of a statistical model using the method of maximum likelihood. • Output: p-values (pi), representing the significance of expression change.

Network motifs: discovery and applications