Probabilistic graphical models and regulatory networks

Probabilistic graphical models and regulatory networks BMI/CS 576 www.biostat.wisc.edu/bmi576.html Sushmita Roy sroy@biostat.wisc.edu Nov 27th, 2012

Two main questions in regulatory networks Sko1 Hot1 HSP12 X2 X1 Hot1 Sko1 BOOLEAN LINEAR DIFF. EQNS PROBABILISTIC …. Hot1 regulates HSP12 ψ(X1,X2) HSP12 is a target of Hot1 HSP12 Y Function Structure Who are the regulators? How they determine expression levels?

Graphical models for representing regulatory networks • Bayesian networks • Dependency networks Random variables encode expression levels Sho1 Msb2 Regulators X2 X1 X1 Ste20 Y3=f(X1,X2) X2 Y3 Target Y3 Structure Function Edges correspond to some form of statistical dependencies

Bayesian networks: estimate a set of conditional probability distributions … ? ? ? Regulators (parents) Yi Function: Conditional probability distribution (CPD) Target (child)

Dependency networks: a set of regression problems Regulators 1 p 1 … 1 ? ? ? 1 Yi X1 …… Xp = bj Yi d p d Function: Linear regression Regularization term Number of genes

Bayesian networks • a BN is a Directed Acyclic Graph (DAG) in which • the nodes denote random variables • each node X has a conditional probability distribution (CPD) representing P(X | Parents(X)) • A type of probabilistic graphical model (PGM) • A graph • Parameters (for probability distributions) • the intuitive meaning of an arc from X to Y is that X directly influences Y • Provides a tractable way to work with large joint distributions

Bayesian networks for representing regulatory networks … ? ? ? Regulators (parents) Yi Conditional probability distribution (CPD) Target (child)

Example Bayesian network Parents X2 X1 X4 X3 Child Assume Xi is binary X5 Needs 25 measurements No independence assertions Needs 23 measurements Independence assertions

P( D | A, B,C) as a tree A f t Pr(D =t) = 0.9 B f t Pr(D =t) = 0.5 C f t Pr(D =t) = 0.8 Pr(D =t) = 0.5 Representing CPDs for discrete variables • CPDs can be represented using tables or trees • consider the following case with Boolean variables A, B, C, D P( D | A, B,C) as a table

Representing CPDs for continuous variables Parameters X2 X1 X3 Conditional Gaussian

The learning problems • Parameter learning: • Given a network structure learn the parameters from data • Structure learning: • Given data, learn the structure (and parameters) • Subsumes parameter learning

Structure learning • Maximum likelihood framework • Bayesian framework Score

The structure learning task • structure learning methods have two main components • a scheme for scoring a given BN structure • a search procedure for exploring the space of structures

Structurelearning using score-based search ... Maximum likelihood Best graph

Decomposability of scores • Score decomposes over variables. • Thus we can independently optimize the S(X_i) • It all boils down to accurately estimating the conditional probability distributions

D D D C C C B B B A A A Structure search operators given the current network at some stage of the search, we can… add an edge delete an edge Check for cycles

Bayesian network search: hill-climbing given: data set D, initial network B0 i = 0 BbestB0 while stopping criteria not met { for each possible operator application a { Bnewapply(a, Bi) if score(Bnew) > score(Bbest) BbestBnew } ++i BiBbest }

Learning networks from expression is difficult due to too few measurements • Reduce the candidate parents • Sparse candidate • Prior knowledge • MinREG • Reduce the target set • Module networks

Bayesian network search: the Sparse Candidate algorithm[Friedman et al., UAI 1999] Given: data set D, initial network B0, parameter k

Therestrict step in Sparse Candidate • to identify candidate parents in the first iteration, can compute the mutual information between pairs of variables • where denotes the probabilities estimated from the data set

D C C B D A A Therestrict step in Sparse Candidate • suppose true network structure is: • we’re selecting two candidate parents for A and I(A; C) > I(A; D) > I(A; B) • the candidate parents for A would then be C and D; how could we get B as a candidate parent on the next iteration?

The restrict step in Sparse Candidate • Kullback-Leibler(KL) divergence provides a distance measure between two distributions, P and Q • mutual information can be thought of as the KL divergence between the distributions (assumes X and Y are independent)

D C C B D B A A The restrict step in Sparse Candidate • we can use KL to assess the discrepancy between the network’s estimate Pnet(X, Y) and the empirical estimate true distribution current Bayes net How might we calculate Pnet(A,B)?

important to ensure monotonic improvement The restrict step in Sparse Candidate Pa(Xi): current parents of Xi

Themaximize step in Sparse Candidate • hill-climbing search with add-edge, delete-edge,reverse-edge operators • test to ensure that cycles aren’t introduced into the graph

Efficiency of Sparse Candidate n = number of variables

Probabilistic graphical models and regulatory networks

Probabilistic graphical models and regulatory networks

Presentation Transcript

An introduction to machine learning and probabilistic graphical models

Exact and approximate inference in probabilistic graphical models

Exact and approximate inference in probabilistic graphical models

Graphical Models, Distributed Fusion, and Sensor Networks

Graphical Models

Undirected Probabilistic Graphical Models (Markov Nets)

Graphical Models

Probabilistic Graphical Models for Semi-Supervised Traffic Classification

Reverse engineering gene and protein regulatory networks using Graphical Models.

Probabilistic graphical models

Probabilistic Graphical Models

Directed Graphical Probabilistic Models:

Probabilistic Graphical Models

Probabilistic and Possibilistic Graphical Models in Complex Applications

Reconstructing gene regulatory networks with probabilistic models

Query-Specific Learning and Inference for Probabilistic Graphical Models

Bayesian Networks (Directed Acyclic Graphical Models)

Probabilistic Graphical Models

Bayesian Networks (Directed Acyclic Graphical Models)