Probabilistic Modeling of System Dynamics: Integrating Stochastic Processes and Biological Mechanisms

Probabilistic (=stochastic) modeling Intrinsic dynamics of system state x (x = huge vector) Observation / measurement Clockwork universe (Isaac Newton) In practice: we do not know the initial conditions x(t0) exactly we do not know f exactly We call our ignorance ‚noise‘ and mathematically dress it by introducing random terms in x(t0), f

Terminology The time evolution of the system state X of a deterministic process is described by (ordinary or partial) differential equations. For stochastic processes, the most useful subcategory are Markov processes. Here, the time evolution of the probability distribution P(X) is described by a Langevin-equation, Fokker-Planck-equation, master equation. There are efficient computer algorithms to solve these equations („Monte-Carlo-simulation“)

The forward and the backward problem. Forward problem: given a model, and its parameters, how does the solution look like? This is often the pre-occupation of physicists, and of languages such as Matlab. Backward problem: given data, which model, and which parameters, best explain them? This is often the pre-occupation of statisticians, and of languages such as R.

Modelling is more than time-dependent differential equations A differential equation predicts how an initial state x(t0) turns into subsequent states x(t). It also encodes ideas about the causality / mechanism of this turning. Any mathematical approach that produces predictions and which gives us causal understanding is a model. E.g.: genotype  phenotype CRM  expression pattern sequence  structure cellular signal  transcr. response  differentiation physical parameters of microtubuli  cell shape

A multivariate genotype-phenotype model BC187 YPS606 oak vineyard Sporulation efficiency: 99% 3.5% 374 recombinant segregants 5 linked loci linkage analysis Gehrke et al., Science 323 (2009)

Model: analytical series A multivariate genotype-phenotype model Parameter fit (ANOVA) Gehrke et al., Science 323 (2009)

Brute-force (unbiased) fitting is not possible for larger sets of variables - overfitting: we need smart regularisations of the models informed by prior knowledge („networks“, mechanisms, physics) Multivariate phenotypes (morphology, dynamic behaviour, e.g. from imaging) Model dependencies between phenotypes; model underlying molecular mechanisms Challenges for Future Research

Quantitative Modelling - Thermodynamic based Janssens and Reinitz, Nat Genet, v38, 2006 Zinzen et al, Curr Bioil v16, 2006 Segal et al, Nature, v451, 2008 Cis-regulatory networks: The Eve locus as a paradigm Furlong Group

ChIP binding atlas TF binding profiles Mesoderm Predict: Enhancer expression based on similar binding profiles to the training set 5 training sets (groups of enhancers with similar spatial patterns of expression) Somatic muscle Visceral muscle Machine learning (SVM) Predicting enhancer temporo-spatial expression CAD: in vivo reporter assays Enhancer spatio-temporal expression patterns Furlong Group

Enhancer-activity (GFP) CRM binding (ChIP)       Test set classification per-for-mance Mesoderm Visceral muscle Training sets Somatic muscle Mesoderm & somatic muscle Visceral & somatic muscle

TF occupancy is sufficient to predict enhancer activity Validating predictions with transgenic GFP-reporter assays Cross-validation 6 or more predictions tested per group Correct Partial Fail Mesoderm 86% 60 1 Correct Partial Fail Visceral mus 87% 7 0 1 Correct Partial Fail Visceral & Somatic mus 83% 5 1 0 Mesoderm & somatic mus Correct Partial Fail 83% 5 1 0 Furlong Group

Probabilistic Modeling of System Dynamics: Integrating Stochastic Processes and Biological Mechanisms