1 / 31

Learning Bayesian Networks

Learning Bayesian Networks. Dimensions of Learning. X 1 true false false true. X 2 1 5 3 2. X 3 0.7 -1.6 5.9 6.3. Learning Bayes nets from data. Bayes net(s). data. X 1. X 2. Bayes-net learner. X 3. X 4. X 5. X 6. X 7. + prior/expert information. X 8. X 9. Q. X 1.

serinas
Télécharger la présentation

Learning Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Bayesian Networks

  2. Dimensions of Learning

  3. X1 true false false true X2 1 5 3 2 X3 0.7 -1.6 5.9 6.3 ... . . . . . . Learning Bayes netsfrom data Bayes net(s) data X1 X2 Bayes-net learner X3 X4 X5 X6 X7 + prior/expert information X8 X9

  4. Q X1 X2 XN ... toss 1 toss 2 toss N From thumbtacks to Bayes nets Thumbtack problem can be viewed as learning the probability for a very simple BN: X heads/tails

  5. tails heads X Y heads/tails heads/tails “heads” “tails” The next simplest Bayes net

  6. X Y heads/tails heads/tails QX X1 X2 XN The next simplest Bayes net ? QY case 1 Y1 case 2 Y2 YN case N

  7. X Y heads/tails heads/tails QX X1 X2 XN The next simplest Bayes net "parameter independence" QY case 1 Y1 case 2 Y2 YN case N

  8. X Y heads/tails heads/tails QX X1 X2 XN The next simplest Bayes net "parameter independence" QY case 1 Y1 ß case 2 Y2 two separate thumbtack-like learning problems YN case N

  9. X Y heads/tails heads/tails A bit more difficult... Three probabilities to learn: • qX=heads • qY=heads|X=heads • qY=heads|X=tails

  10. X Y heads/tails heads/tails A bit more difficult... QY|X=heads QY|X=tails QX heads X1 Y1 case 1 tails X2 Y2 case 2

  11. X Y heads/tails heads/tails A bit more difficult... QY|X=heads QY|X=tails QX X1 Y1 case 1 X2 Y2 case 2

  12. X Y heads/tails heads/tails A bit more difficult... ? ? QY|X=heads QY|X=tails QX ? X1 Y1 case 1 X2 Y2 case 2

  13. X Y heads/tails heads/tails A bit more difficult... QY|X=heads QY|X=tails QX X1 Y1 case 1 X2 Y2 case 2 3 separate thumbtack-like problems

  14. In general … Learning probabilities in a Bayes netis straightforward if • Complete data • Local distributions from the exponential family (binomial, Poisson, gamma, ...) • Parameter independence • Conjugate priors

  15. X Y heads/tails heads/tails Incomplete data makes parameters dependent QY|X=heads QY|X=tails QX X1 Y1 case 1 X2 Y2 case 2

  16. Solution: Use EM • Initialize parameters ignoring missing data • E step: Infer missing values usingcurrent parameters • M step: Estimate parameters using completed data • Can also use gradient descent

  17. Learning Bayes-net structure Given data, which model is correct? X Y model 1: X Y model 2:

  18. Bayesian approach Given data, which model is correct? more likely? X Y model 1: Datad X Y model 2:

  19. Bayesian approach:Model averaging Given data, which model is correct? more likely? X Y model 1: Datad X Y model 2: average predictions

  20. Bayesian approach:Model selection Given data, which model is correct? more likely? X Y model 1: Datad X Y model 2: Keep the best model: - Explanation - Understanding - Tractability

  21. To score a model,use Bayes’ theorem Given data d: model score "marginal likelihood" likelihood

  22. Thumbtack example X heads/tails conjugate prior

  23. X Y heads/tails heads/tails More complicated graphs 3 separate thumbtack-like learning problems X Y|X=heads Y|X=tails

  24. Model score for adiscrete Bayes net

  25. Computation ofmarginal likelihood Efficient closed form if • Local distributions from the exponential family (binomial, poisson, gamma, ...) • Parameter independence • Conjugate priors • No missing data (including no hidden variables)

  26. initialize structure score all possible single changes perform best change any changes better? yes no return saved structure Structure search • Finding the BN structure with the highest score among those structures with at most k parents is NP hard for k>1 (Chickering, 1995) • Heuristic methods • Greedy • Greedy with restarts • MCMC methods

  27. Structure priors 1. All possible structures equally likely 2. Partial ordering, required / prohibited arcs 3. Prior(m) a Similarity(m, prior BN)

  28. Parameter priors • All uniform: Beta(1,1) • Use a prior Bayes net

  29. Parameter priors Recall the intuition behind the Beta prior for the thumbtack: • The hyperparameters ah and at can be thought of as imaginary counts from our prior experience, starting from "pure ignorance" • Equivalent sample size = ah + at • The larger the equivalent sample size, the more confident we are about the long-run fraction

  30. x1 x2 x3 x4 x5 x6 x7 x8 x9 Parameter priors imaginary count for any variable configuration equivalent sample size + parameter modularity parameter priors for any Bayes net structure for X1…Xn

  31. x1 x2 x3 x4 x5 x6 x1 x2 x7 x3 x4 x8 x5 x9 x6 x7 x1 true false false true x2 false false false true x3 true true false false x8 x9 ... . . . . . . Combining knowledge & data prior network+equivalent sample size improved network(s) data

More Related