Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech

Discrete models of biological networksSegunda Escuela Argentina de Matematica y BiologiaCordoba, ArgentinaJune 29, 2007 Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech

Topics • Boolean networks and cellular automata (including probabilistic and sequential BNs) • Polynomial dynamical systems over finite fields • Logical models • Dynamic Bayesian networks

Boolean networks Definition. Let f1,…,fn be Boolean functions in variables x1,…,xn. A Boolean network is a time-discrete dynamical system f = (f1,…,fn) : {0, 1}n→ {0, 1}n The state space of f is the directed graph with the elements of {0,1}n as nodes. There is a directed edge b → c iff f(b) = c.

Boolean networks f1 = :x2 f2 = x4OR (x1 AND x3) f3 = x4 AND x2 f4 = x2 OR x3

The phase plane dy = g(xo ,yo) dt (xo ,yo) dx = f (xo ,yo) dt Compound y dx /dt = f (x,y) dy /dt = g(x,y) Compound x Courtesy J. Tyson

Cellular automata Definition. A 1-dimensional (binary) cellular automaton (CA) f is a Boolean network f in which fionly depends on some or all of xi-1, xi, xi+1(modulo n). Example. fi = xi-1 XOR xi+1.

Initial State: t =1: t =2: t =3: t =4: t =5: t =6: t =7: t =8: t =9: Example

Rule 90 with 5 nodes f(x1,x2,…,x5) = (x5 XOR x2, x1 XOR x3, … , x4 XOR x1)

Boolean network models in biology Stuart A. Kauffman Metabolic stability and epigenesis in randomly constructed genetic nets J. Theor. Biol. 22 (1969) 437-467. Boolean networks as models for genetic regulatory networks: Nodes = genes, functions = gene regulation Variable states: 1 = ON, 0 = OFF

x AND y = xy x OR y = x+y+xy NOT x = x+1 (x XOR y = x+y) Polynomial dynamical systems Note: {0, 1} = k has a field structure (1+1=0). Fact: Any Boolean function in n variables can be expressed uniquely as a polynomial function in k[x1,…,xn] / <xi2 – xi>, and conversely. Proof:

Polynomial dynamical systems Let k be a finite field and f1, … , fn k[x1,…,xn] f = (f1, … , fn) : kn → kn is an n-dimensional polynomial dynamical system over k. Natural generalization of Boolean networks. Fact: Every function kn → k can be represented by a polynomial, so all finite dynamical systems kn → kn are polynomial dynamical systems.

Example k = F3 = {0, 1, 2}, n = 3 f1 = x1x22+x3, f2 = x2+x3, f3 = x12+x22. Dependency graph (wiring diagram)

Sequential polynomial systems k = F3 = {0, 1, 2}, n = 3 f1 = x1x22+x3 f2 = x2+x3 f3 = x12+x22 σ = (2 3 1) update schedule: First update f2. Then f3, using the new value of x2. Then f1, using the new values of x2 and x3.

Sequential systems as biological models • Different regulatory processes happen on different time scales • Stochastic effects in the cell affect the “update order” of variables representing different chemical compounds at any given time Therefore, sequential update in models of regulatory networks adds realistic feature.

Stochastic models Polynomial dynamical systems (PDSs) can be modified: • Choose random update order for each update (see Sontag et al. for Boolean case) • Choose an update function at random from a collection at each update (see Shmulevich et al. for Boolean case)

Open mathematical problems • Determine the relationship between the structure of the fiand the dynamics of the system for special classes of models (see later lectures). • Determine the effect of the update schedule on dynamics. • Develop a categorical framework for (sequential/stochastic) PDSs. • Determine and study a good class of “biologically meaningful” polynomial functions.

Example • Jarrah, B. Raposa, and R. Laubenbacher, Nested canalyzing, unate cascade, and polynomial functions, Physica D, in press

Logical models E. Snoussi and R. Thomas Logical identification of all steady states: the concept of feedback loop characteristic states Bull. Math. Biol. 55 (1993) 973-991 Key model features: • Time delays of different lengths for different variables are important • Positive and negative feedback loops are important

Model description Basic structure of logical models: • Sets of variables x1, … , xn; X1, … , Xn (Xi = genes and xi = gene products, e.g., proteins. A gene product x regulates a gene Y, with a certain time delay.) Each variable pair xi, Xi takes on a finite number of distinct states or thresholds (possibly different for different i), corresponding to different modes of action of the variables for different concentration levels.

Model description (cont.) 2. A directed weighted graph with the xias nodes and threshold levels, indicating regulatory relationships and at what levels they occur. Each edge has a sign, indicating activation (+) or inhibition (-). 3. A collection of “logical parameters” which can be used to determine the state transition of a given node for a given configuration of inputs.

Features of logical models • Sophisticated models that include many features of real networks • Ability to construct continuous models based on the logical model specification • Models encode intuitive network properties • Ability to relate structure (+ and - feedback loops) to dynamics (multistationarity, fixed pt vs. periods)

An Example y x z

Features of logical models • Include many features of real biological networks • Intuitive but complicated formalism and model description • Difficult to study as a mathematical object • Difficult to study dynamics for larger models

Dynamic Bayesian networks Definition. A Bayesian network (BN) is a representation of a joint probability distribution over a set X1, … , Xn of random variables. It consists of • an acyclic graph with the Xi as vertices. A directed edge indicates a conditional dependence relation • a family of conditional distributions for each variable, given its parents in the graph

An example http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html#repr

Inference Cond. Prob.: P(A | B) = P(A∩B)/P(B) Bayes’ rule: P(R=r | e) = P(e | R=r)P(R=r)/P(e)

BN models of gene regulatory networks Can use BNs to model gene regulatory networks: Random variables Xi↔ genes Directed edges ↔ regulatory relationships Problem: BNs cannot have directed loops. Hence cannot model feedback loops.

Dynamic Bayesian networks Definition. A dynamic Bayesian network (DBN)is a representation of the stochastic evolution of a set of random variables{Xi},using discrete time. It has two components: • a directed graph (V, E) encoding conditional dependence conditions (as before); • a family of conditional probability distributions P(Xi(t) | Pai(t-1)), where Pai = {Xj | (Xj, Xi) E} (Doyer et al., BMC Bioinformatics (2006) 7)

Dynamic Bayesian networks DBNs generalize Hidden Markov Models and linear dynamical systems. Recently used for inference of gene regulatory networks from time courses of microarray data.

Summary Modeling frameworks: Boolean networks Polynomial dynamical systems Logical models Dynamic Bayesian networks (Petri nets)

Model inference from data Goal: Given a set of experimental observations, infer the most likely model of the network that generated the data. Model framework: polynomial dynamical systems over a finite field

Data discretization Step 1: Discretize real-valued data into finitely many states. This is a difficult problem. E. Dimitrova, P. Vera-Licona, J. McGee, and R. Laubenbacher, Comparison of data discretization methods for inference of biochemical networks.

Model inference from data Variables x1, … , xn with values in a finite field k. (s1, t1), … , (sr, tr) state transition observations with sj, tjkn. Goal: Identify a collection of “best” dynamical systems f=(f1, … ,fn): kn→ kn such that f(sj)=tj for all j.

Network inference Problem: Given D={(sj, tj)  kn×k}, find the “most likely” model f: kn→ k such that f(sj) = tj Let M = {f: kn→ k | f(sj) = tj }be the subset of k[x1, … , xn]of all possible models for a particular variable.

Network inference Let f, g  M. Then f(sj) = g(sj) for all j. So (f-g)(sj) = 0 for all j. Let I = {h k[x1, … , xn] | h(sj)=0 for all j} Let f 0 be any element of M. Then M = f 0+I. Note that I is an ideal, since it is closed under + and × by arbitrary polynomials.

Model selection In the absence of additional network information, choose a “minimal” model f from M (f only reflects relationships among variables that are inherent in the data) If f = hg +f’, with g  I and f’ is not divisible by any r  I, then f’ is preferable to f because hg vanishes on all sj.

Model selection Strategy: • Compute f 0 M and the coset f 0+I. • Compute f  f 0+I with the property that f is not divisible by any g  I. Could use other criteria for model selection: f must contain certain variables and can’t contain others. Could also require certain constraints on the dynamics.

Fundamental computational problem Given I and f, decide whether f  I. If not, compute the remainder of f under “division by I.” This is known as the “ideal membership problem.” This problem can be solved by Gröbner basis theory.

Wiring diagrams Goal: Compute all possible minimal wiring diagrams for a given data set. Wiring diagram: Vertices = variables Edges: xi→ xjif xi is involved in the regulation of xj, that is, if xi appears in fj.

Wiring diagrams Problem: Given data (si,ti), i=1, … , r, (a collection of state transitions for one node in the network),find all minimal (wrt inclusion) sets of variables y1, … , ym{x1, … , xn}such that (f 0+I) ∩ k[y1, … , ym] ≠ Ø. Each such minimal set corresponds to a minimal wiring diagram for the variable under consideration.

The “minimal sets” algorithm For a  k, let Xa = {si | ti = a}. Let X = {Xa | a  k}. Then f 0+I = M = {f  k[x1, … xn] | f(p) = a for all p  Xa}. Want to find f M which involves a minimal number of variables, i.e., there is no g  M whose support is properly contained in the supp(f).

Example Let n = 5,k = F5. Let (s1, t1) = [(3, 0, 0, 0, 0); 3] (s2, t2) = [(0, 1, 2, 1, 4);1] (s3, t3) = [(0, 1, 2, 1, 0); 0] (s4, t4) = [(0, 1, 2, 1, 1); 0] (s5, t5) = [(1, 1, 1, 1, 3); 4] Then X0 = {s3, s4}, X1 = {s2}, X2 = Ø, X3 = {s1}, X4 = {s5}.

The algorithm Definitions. • For F {1, … , n}, let RF = k[xi | i  F]. • Let ΔX = {F | M ∩ RF ≠ Ø}. • For p Xa, q Xb, a ≠ b  k, let m(p, q) = pi≠qi xi. Let MX= monomial ideal in k[x1, … , xn] generated by all monomials m(p, q)for all a, b  k. (Note that ΔX is a simplicial complex, and MX is the face ideal of the Alexander dual of ΔX.)

The algorithm Proposition. A subset F of {1, … , n} is in ΔXif and only if the ideal < xi | i  F > contains the ideal MX. Proof. Let F ΔX. Then Y ∩ RF≠Ø. Let p Xaandq Xb, with a ≠ b. Then there is f k[xi | i  F]such that f(p) = a and f(q) = b. So p and q differ in a coordinate j F. Hence m(p, q) contains xj as a factor, so is contained in I = <xj | j  F>. Therefore, MXI.

The algorithm Conversely, suppose MX <xi | i F>. Then all generators m(p, q) are in terms of the xi, i F. Therefore, p Xa and q Xb differ in coordinates iF. For p Xa and for all a  k, define f to be the polynomial function f(p) = a forp Xa, for all a  k; f(p) = 0 otherwise. Then f  M and depends only on variables xi, iF. Hence f  M ∩ RF. This completes the proof.

Reinhard Laubenbacher Virginia Bioinformatics Institute and Mathematics Department Virginia Tech