170 likes | 310 Vues
Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks. Christopher A. Penfold Vicky Buchanan-Wollaston Katherine J. Denby And David L. Wild Published in Bioinformatics. Motivation.
E N D
Nonparametric Bayesian inference for perturbed and orthologousgene regulatory networks Christopher A. Penfold Vicky Buchanan-Wollaston Katherine J. Denby And David L. Wild Published in Bioinformatics
Motivation • Reverse engineering of Gene Regulation Networks– interesting area of research • Previous assumptions– multiple time series data assumes identical topology • Whereas as in reality different sets of transcription factors (but overlapping) are expected to bind in different conditions • How to handle the information from this somewhat diverse set of multiple datasets?
Some researchers used non-parametric Bayesian learning strategies but for those techniques to be computationally feasible the no. of transcription factors (TFs) , that can bind to the promoter region of a gene, need to be limited. • However, recent studies a large no. of TFs have the potential to bind to any gene
However, it is noticed that the no. of TFs binding under some specific conditions are fewer. • So, it is of interest to find this subset of TFs w.r.t each specific condition applied which can result in a different GRN.
Causal Structure Identification • The CSI algorithm (Klemm, 2008; Penfold and Wild, 2011) and related approaches (Äijö and Lähdesmäki, 2009) have previously been used to reverse engineer GRNs and shown to perform well • The discrete-time version of CSI assumes that the mRNA expression level of a particular gene in a larger set, i∈G, as:
where xi(t) represents the expression level of gene i at time t, Pa(i)⊆G represents the genes encoding for TFs binding the promoter regions of gene i (parents of gene i) with xPa(i)(t) the vector expression level of those parents at time t, and f (·) represents some unknown (non-linear) function capturing the dynamics of the system.
Usually the parent genes are not known as a prior so, that data is used to infer them as follows: where T ⊆G represents the set of all transcription factors and θk the set of hyperparameters for the k-th parental set. The distribution depends on the values of the parameters for which the Expectation Maximization is being used
Finally, a distribution over causal network structures, P(M), can be assembled from the distribution over individual parental sets, constituting the CSI algorithm:
Hierarchical modelling for CSI • In this framework, the joint distribution for all model parameters conditioned on the data is factorised as:
The conditional distribution for the parents of gene i in dataset j given the hyperparent is chosen to correspond to a Gibbs distribution:
Again, a network structure can be assembled from the parent distributions for each node, with a hypernetworkassembled from the the distributions over hyperparents:
Combining hierarchical modelling and yeastone-hybrid • YIH used to identify the genes capable of binding to the promoter region . • In this study a gene RD29A was used and previous study suggests 9 such genes. • However, in this study, time series data was collected using 6 timestamps under different conditions.