1 / 18

Learning In Bayesian Networks

Learning In Bayesian Networks. Learning Problem. Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1 , x 2 , …, x N } Each observation specifies values of subset of variables x 1 = {w 1 , x 1 , ?, z 1 , …} x 2 = { w 2 , x 2 , y 2 , z 2 , …}

janae
Télécharger la présentation

Learning In Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning In Bayesian Networks

  2. Learning Problem • Set of random variables X = {W, X, Y, Z, …} • Training set D = {x1, x2, …, xN} • Each observation specifies values of subset of variables • x1 = {w1, x1, ?, z1, …} • x2= {w2, x2, y2, z2, …} • x3= {?, x3, y3, z3, …} • Goal • Predict joint distribution over some variables given other variables • E.g., P(W, Y | Z, X)

  3. Classes Of Graphical Model Learning Problems • Network structure knownAll variables observed • Network structure knownSome missing data (or latent variables) • Network structure not knownAll variables observed • Network structure not knownSome missing data (or latent variables) today and next class going to skip(not too relevantfor papers we’ll read;see optionalreadings for moreinfo)

  4. Learning CPDs When All Variables Are Observed And Network Structure Is Known • Trivial problem? X Y Training Data Z

  5. Recasting Learning As Inference • We’ve already encountered probabilistic models that have latent (a.k.a. hidden, nonobservable) variables that must be estimated from data. • E.g., Weiss model • Direction of motion • E.g., Gaussian mixture model • To which cluster does each data point belong Why not treat unknown entries inthe conditional probability tablesthe same way?

  6. Recasting Learning As Inference Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations, you can infer the bias of the coin This is learning. This is inference.

  7. Treating Conditional ProbabilitiesAs Latent Variables Graphical model probabilities (priors, conditional distributions) can also be cast as random variables • E.g., Gaussian mixture model Remove the knowledge “built into” the links (conditional distributions) and into the nodes (prior distributions). Create new random variables to represent the knowledge  Hierarchical Bayesian Inference q z z λ z λ x x x

  8. Slides stolen from David Heckerman tutorial

  9. training example 1 training example 2

  10. Parameters might not be independent training example 1 training example 2

  11. General Approach:Learning Probabilities in a Bayes Net If network structure Shknown and no missing data… We can express joint distribution over variables X in terms of model parameter vector θs Given random sample D = {x1, x2, ..., xN}, compute the posterior distribution p(θs| D, Sh) • Given posterior distribution, marginals and conditionals on nodes in network can be determined. Probabilistic formulation of all supervised and unsupervised learning problems.

  12. Computing Parameter Posteriors E.g., net structure X→Y

  13. Computing Parameter Posteriors Given complete data (all X,Y observed) and no direct dependencies among parameters, Explanation • Given complete data, each setof parameters is disconnected fromeach other set of parameters in the graph parameter independence θx θx θx X Y θy|x D separation

  14. Posterior Predictive Distribution • Given parameter posteriors • What is prediction of next observation XN+1? • How can this be used for unsupervised and supervised learning? What we talked about the past three classes What we just discussed

  15. Prediction Directly From Data • In many cases, prediction can be made without explicitly computing posteriors over parameters • E.g., coin toss example from earlier class • Posterior distribution is • Prediction of next coin outcome

  16. Generalizing To Multinomial RVs In Bayes Net • Variable Xi is discrete, with values xi1, ... xiri • i: index of multinomial RVj: index over configurations of the parents of node ik: index over values of node i • unrestricted distribution: one parameter per probability Xi Xb Xa

  17. Prediction Directly From Data: Multinomial Random Variables • Prior distribution is • Posterior distribution is • Posterior predictive distribution: I: index over nodes j: index over values of parents of I k: index over values of node i

  18. Other Easy Cases • Members of the exponential family • see Barber text 8.5 • Linear regression with Gaussian noise • see Barber text 18.1

More Related