Probabilistic Approximations of Bio-Pathways Dynamics

Probabilistic Approximations of Bio-Pathways Dynamics P.S. Thiagarajan School of Computing, National University of Singapore Joint Work with: Liu Bing, David Hsu

Bio-pathways • Intracellular: • Gene regulatory networks • Metabolic networks • signalling pathways.

A Common Modeling Approach • View a bio-pathway as a network of bio-chemical reactions • Model the network as a system of ODEs • One for each molecular species • Mass action, Michelis-Menten, Hill, etc. • Study the ODE system to understand the dynamics of the bio-chemical network

Major Hurdles • Many rate constants not known • must be estimated • noisy data; limited precision; population-based • High dimensional system • closed-form solutions are impossible • Must resort to numerical simulations • Point values of initial states/data will not be available. • a large number of numerical simulations needed for answering each question

Our work: • Decompose  Analyze  Compose (ISMB’06, WABI’07) • Probabilistic models and methods to: • approximate the ODE dynamics(CMSB’09, TCS’11). • update models (RECOMB’10)

The “Opinion Poll” Idea • Start with an ODEs system. • Discretize the time and value domains. • Assume a (uniform) distribution of initial states • Generate a “sufficiently” large number of trajectories by • sampling the initial states and numerical simulations. • Encode this collection of trajectories as a dynamic Bayesian network. • ODEs  DBN

The “opinion poll” Idea • Pay the one-time cost of constructing the Bayesian network. • Amortize this cost: • perform multiple analysis tasks using inferencing algorithms for DBNs.

x(t) tmax t0 t1 t2 ... ... ... ... Time Discretization • Observe the system only at discrete time points; in fact, only at finitely many time points x(t) = t3 + 4t + x(0)

x(t) E D C B A t0 t1 t2 tmax ... ... ... ... Value Discretization • Observe only with bounded precision

D D E D C B C x(t) E D C B A t0 t1 t2 tmax ... ... ... ... Discrete Trajectories • A trajectory now is sequence of discrete values

Discretized Behavior Assume a prior distribution of the initial states. Behavior = (Uncountably) many sequences of discrete values. D D E D C B C ... ... D x(t) E D C B A t t0 t1 t2 tmax ... ... ... ...

Representation of Behavior In fact, a probabilistic transition system. Pr( (D, 2) (E, 3) ) is the “fraction” of the trajectories residing in D at t = 2 that land in E at t = 3. 0.8 D D E D C B C 0.2 D E D C B A t t0 t1 t2 tmax ... ... ... ...

Mathematical Justification The value space of the variables is assumed to be a compact subset C of In Z’ = F(Z), F is assumed to be differentiable everywhere in C. Mass-law, Michaelis-Menton,… for every rate constant k. Then the solution (flow) t : C  C (for each t) exists, is unique, a bijection, continuous and hence measurable. But the transition probabilities can not be computed.

0.8 D D E D C B C 0.2 D Computational Approximation (s, i) – States; (s, i)  (s’, i+1) -- Transitions Sample, say, 1000 times the initial states. Through numerical simulation, generate 1000 trajectories. Pr((s, i)  (s’ i+1)) is the fraction of the trajectories that are in s at ti which land in s’ at t i+1. E D C B A t t0 t1 t2 tmax ... ... ... ...

Infeasible Size! • But the transition system will be huge. • O(T . kn) • k  2 and n ( 50-100) can be large.

Compact Representation • Exploit the network structure (additional independence assumptions) to construct a dynamic Bayesian network instead. • The DBN is a factored form of a probabilistic transition system (Markov chain).

The DBN Representation S0 S3 S2 S1 E3 E2 E0 E1 ES3 ES0 ES2 ES1 P1 P0 P2 P3 ... ... ... ... ... ... ... ...

The DBN Representation S0 S1 S3 S2 E0 E3 E1 E2 ES0 ES1 ES3 ES2 P2 P0 P3 P1 A B ... ... C ... ... ... ... ... ...

Each node has a CPT associated with it. This specifies the local (probabilistic) dynamics. S2 S3 S1 S0 E2 E1 E0 E3 ES0 ES1 ES2 ES3 P0 P1 P3 P2 P(S2=C|S1=B,E1=C,ES1=B)= 0.2 P(S2=C|S1=B,E1=C,ES1=C)= 0.1 P(S2=A|S1=A,E1=A,ES1=C)= 0.05 . . . A B ... ... C ... ... . ... ... ... ... B B B C C C C B B A ... ... C B B C C A B C C C

Fill up the entries in the CPTs by sampling, simulations and counting S0 S2 S3 S1 E0 E2 E1 E3 ES2 ES3 ES0 ES1 P3 P2 P1 P0 P(S2=C|S1=B,E1=C,ES1=B)= 0.2 P(S2=C|S1=B,E1=C,ES1=C)= 0.1 P(S2=A|S1=A,E1=A,ES1=C)= 0.05 . . . A B ... ... C ... ... . ... ... ... ...

Computational Approximation Fill up the entries in the CPTs by sampling, simulations and counting S0 S2 S3 S1 E0 E2 E1 E3 ES2 ES3 ES0 ES1 P3 P2 P1 P0 500 100 A B ... ... C ... ... 1000 ... ... ... ...

The Technique Fill up the entries in the CPTs by sampling, simulations and counting S0 S2 S3 S1 E0 E2 E1 E3 ES2 ES3 ES0 ES1 P3 P2 P1 P0 P(S2=C|S1=B,E1=C,ES1=B)= 100/500= 0.2 500 100 A B ... ... C ... ... ... ... ... ...

The Technique S0 S1 S3 S2 E0 E1 E2 E3 ES0 ES1 ES2 ES3 P0 P1 P2 P3 P(S2=C|S1=B,E1=C,ES1=B)= 100/500= 0.2 500 100 A The size of the DBN is: O(T . n . kd) B ... ... C ... ... ... ... d will be usually much smaller than n. ... ...

Unknown rate constants S0 S3 S2 S1 E3 E2 E0 E1 ES3 ES0 ES2 ES1 P1 P0 P2 P3 k2 ... ... ... ... ... ... ... ...

Unknown rate constants During the numerical generation of a trajectory, the value of k2 does not change after sampling. S0 S1 S3 S2 E0 E3 E1 E2 ES0 ES1 ES3 ES2 k2 P2 P0 P3 k2 k2 P1 k2 ... ... ... ... ... ... ... ...

Unknown rate constants Sample uniformly across all the Intervals. S0 S1 S3 S2 E0 E3 E1 E2 ES0 ES1 ES3 ES2 k2 P2 P0 P3 k2 k2 P1 k2 ... ... ... ... ... ... ... ...

Bayesian inference S0 S1 S2 S3 E0 E1 E2 E3 ES2 ES3 ES0 ES1 k2 k2 k2 P2 k2 P1 P0 P3 ... ... ... ... This is the Factored Frontier algorithm (Murphy & Weiss) Approximate but fast: ... ... ... ...

Parameter Estimation S0 S1 S2 S3 E0 E1 E2 E3 ES2 ES3 ES0 ES1 k2 k2 k2 P2 k2 P1 P0 P3 For each choice of (jnterval) values for unknown parameters, present experimental data as evidence and assign a score using FF. Return parameter estimates as maximal likelihoods. Similar application of FF to do sensitivity analysis. ... ... ... ... ... ... ... ...

DBN based Analysis • Our experiments with signalling pathways models (taken from the biomodels data base) suggest: • The one-time cost of constructing the DBN can be easily recovered by using it to do parameter estimation and sensitivity analysis.

Segmentation Clock Network • Governs the periodic formation process of somites during embryogenesis • The underlying signaling network couples three oscillating pathways: FGF, Wnt and Notch. • 22 ODEs; 75 unknown parameters • 5 equal intervals; t = 5 min; 100 time points; 2.5 million trajectories; 3.1 hours on a 10 PC cluster.

Results • Global sensitivity analysis • ODE based: 81 hours • DBN based: 3 hours LM: Levenberg–Marquardt GA: Genetic Algorithm SRES: Evolutionary Strategy PSO: Particle Swarm Optimization BDM: our method 40 unknown parameters; Synthetic data: 8 proteins, 10 time points (noise with 5% variance added)

ODE model (Brown et al. 2004) 32 species 48 parameters Features: Good size Feedback loops DBN construction Settings 5 intervals 1 min time-step, 100 minutes 3 x 106 samples Runtime 4 hours on a cluster of 10 PCs EGF-NGF Pathway

Parameter Estimation 20 (out of 48) unknown parameters Synthetic data: 9 proteins, 10 time points LM: Levenberg–Marquardt GA: Genetic Algorithm SRES: Evolutionary Strategy PSO: Particle Swarm Optimization BDM: our method

Sensitivity Analysis • Running time • ODE based: 22 hours • DBN based: 34 minutes

Complement System Complement system is a critical part of the immune system Ricklin et al. 2007

Amplified response Inhibition Classical pathway Lectin pathway

Goals Quantitatively understand the regulatory mechanisms of complement system How is the complement activity enhanced under inflammation condition? How is the excessive response of the complement avoided? The model: Classical pathway + the lectin pathway enhancement Inhibitory mechanisms C4BP

Complement System • ODE Model • 42 Species • 45 Reactions • Mass law • Michaelis-Menten kinetics • 92 Parameters (71 unknown) • DBN Construction • Settings • 6 intervals • 100s time-step, 12600s • 1.2 x 106 samples • Runtime • 12 hours on a cluster of 20 PCss

Parameter estimation Training data 4 proteins, 7 time points, 4 conditions * inflammation condition: pH 6.5 calcium 2 nM

Model Calibration (parameter estimation)

Model validation Validated the model using previous published data (Zhang et al 2009)

Enhancement mechanism The antimicrobial response is sensitive to the pH and calcium level

Model prediction: The regulatory effect of C4BP C4BP maintains classical complement activation but delays the maximal response time. But attenuates the lectin pathway activation. Classical pathway Lectin pathway

The regulatory mechanism of C4BP The major inhibitory role of C4BP is to facilitate the decay of C3 convertase A B A D B C C D

Results Both predictions concerning C4BP were experimentally verified. “A Computational and Experimental Study of the Regulatory Mechanisms of the Complement System” Liu et.al: PLoS Comp.Bio7(2011) BioModels database (303.Liu).

Current Collaborators Marie-Veronique Clement G V Shivashankar Ding Jeak Ling Immune signalling during Multiple infections DNA damage/response pathway Chromosome co-localizations and co-regulations

Current work • Parametrized version of FF • Reduce errors by investing more computational time. • Probabilistic verification methods. • Monte Carlo, FF based, … • Implementation on a GPU platform.

Conclusion The DBN approximation method is useful and efficient. This is all very well in practice but will it work in theory?

Our group: S. Akshay Liu Bing Abhinav Dubey Blaise Genest Benjamin Gyori Wang Junjie David Hsu Suchee Palaniappan Gireedhar Venkatachalam P.S. Thiagarajan

Probabilistic Approximations of Bio-Pathways Dynamics

Probabilistic Approximations of Bio-Pathways Dynamics

Presentation Transcript

APPROXIMATIONS ERRORS

Sparse Approximations

From Under-approximations to Over-approximations and Back

Polynomial Approximations

Area Approximations

Approximations

Properties of Approximations

Probabilistic Approximations of ODEs Based Signaling Pathways Dynamics

Approximating Bio-Pathways Dynamics

Quality of Pareto set approximations

Molecular Motion Pathways: Computation of Ensemble Properties with Probabilistic Roadmaps

BION: SYNTHETIC PATHWAYS TO BIO-INSPIRED INFORMATION PROCESSING

Continued development of Probabilistic Molecular Dynamics

Numerical Approximations of Definite Integrals

Multiscale Dynamics of Bio-Systems: Molecules to Continuum

z-Approximations

HARDNESS OF APPROXIMATIONS

Principled Approximations in Probabilistic Programming

A probabilistic approach to exploring global dynamics

Bio-Optical Assessment of Giant Kelp Dynamics

Quality of Pareto set approximations

Linear Approximations