1 / 41

Guidance: Assignment 3 Part 1

Guidance: Assignment 3 Part 1. matlab functions in statistics toolbox betacdf , betapdf , betarnd , betastat , betafit. Guidance: Assignment 3 Part 2. You will explore the role of the priors. The Weiss model showed that priors play an important role when observations are noisy

palani
Télécharger la présentation

Guidance: Assignment 3 Part 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Guidance: Assignment 3 Part 1 • matlab functions in statistics toolbox • betacdf, betapdf, betarnd, betastat, betafit

  2. Guidance: Assignment 3 Part 2 • You will explore the role of the priors. • The Weiss model showed that priors play an important role when • observations are noisy • observations don’t provide strong constraints • there aren’t many observations.

  3. Guidance: Assignment 3 Part 3 • Implement model a bit like Weiss et al. (2002) • Goal: infer motion (velocity) of a rigid shape from observations at two instances in time. • Assume distinctive features that make it easy to identify the location of the feature at successive times.

  4. Assignment 2 Guidance • Bx: the x displacement of the blue square (= delta x in one unit of time) • By: the y displacement of the blue square • Rx: the x displacement of the red square • Ry: the y displacement of the red square • These observations are corrupted by measurement noise. • Gaussian, mean zero, std deviation σ • D: direction of motion (up, down, left, right) • Assume only possibilities are one unit of motion in any direction

  5. Assignment 2: Generative Model Rx conditioned on D=up is drawn from a Gaussian • Same assumptions for Bx, By.

  6. Assignment 2 Math Conditional independence

  7. Assignment 2 Implementation Quiz: do we need worry about the Gaussian density function normalization term?

  8. Introduction To Bayes Nets (Stuff stolen fromKevin Murphy, UBC, and Nir Friedman, HUJI)

  9. What Do You Need To Do Probabilistic Inference In A Given Domain? • Joint probability distribution over all variables in domain

  10. Family of Alarm E B P(A | E,B) e b 0.9 0.1 e b 0.2 0.8 e b 0.9 0.1 0.01 0.99 e b Bayes Nets (a.k.a. Belief Nets) Compact representation of joint probability distributions via conditional independence • Qualitative part • Directed acyclic graph(DAG) • Nodes: random vars. • Edges: direct influence Burglary Earthquake Radio Alarm Call Together Define a unique distribution in a factored form Quantitative partSet of conditional probability distributions Figure from N. Friedman

  11. Burglary Earthquake Radio Alarm Call What Is A Bayes Net? A node is conditionally independent of its ancestors given its parents. E.g., C is conditionally independent of R, E, and Bgiven A Notation: C? R,B,E | A Quiz: What sort of parameter reduction do we get? From 25 – 1 = 31 parameters to 1+1+2+4+2=10

  12. Burglary Earthquake Alarm Conditional Distributions Are Flexible • E.g., Earthquake and Burglary might have independent effectson Alarm • A.k.a. noisy-or • where pBand pEare probabilities of burglary and earthquake alone • This constraint reduces # free parameters to 8!

  13. Why Are Bayes Nets Useful? • Factored representation may have exponentially fewer parameters than full joint • Lower time complexity (i.e., easier inference) • Lower sample complexity (i.e., less data for learning) • Graph structure supports • Modular representation of knowledge • Local, distributed algorithms for inference and learning • Intuitive (possibly causal) interpretation • Strong theory about the nature of cognition or the generative process that produces observed data • Can’t represent arbitrary contingencies among variables, so theory can be rejected by data

  14. Burglary Earthquake Radio Alarm Call Inference • Computing posterior probabilities • Probability of hidden events given any evidence • Most likely explanation • Scenario that explains evidence • Rational decision making • Maximize expected utility • Value of Information • Effect of intervention • Causal analysis Explaining away effect Radio Call Figure from N. Friedman

  15. MINVOLSET KINKEDTUBE PULMEMBOLUS INTUBATION VENTMACH DISCONNECT PAP SHUNT VENTLUNG VENITUBE PRESS MINOVL FIO2 VENTALV PVSAT ANAPHYLAXIS ARTCO2 EXPCO2 SAO2 TPR INSUFFANESTH HYPOVOLEMIA LVFAILURE CATECHOL LVEDVOLUME STROEVOLUME ERRCAUTER HR ERRBLOWOUTPUT HISTORY CO CVP PCWP HREKG HRSAT HRBP BP A Real BayesNet: Alarm • Domain: Monitoring Intensive-Care Patients • 37 variables • 509 parameters • …instead of 237 Figure from N. Friedman

  16. More Real-World Bayes Net Applications • “Microsoft’s competitive advantage lies in its expertise in Bayesian networks”-- Bill Gates, quoted in LA Times, 1996 • MS Answer Wizards, (printer) troubleshooters • Medical diagnosis • Speech recognition (HMMs) • Gene sequence/expression analysis • Turbocodes (channel coding)

  17. Burglary Earthquake Radio Alarm Call Conditional Independence • A node is conditionally independentof its ancestors given its parents. • What about conditionalindependence between variablesthat aren’t directly connected (e.g., Burglaryand Radio)?

  18. d-separation • Criterion for deciding if nodes are conditionally independent. • A path from node u to node v is d-separated by a set of nodes Z if the path matches one of these templates: u z v u z v u z v u z v z

  19. Conditional Independence • Nodes u and v are conditionally independent given set Z if all (undirected) paths between u and v are d-separated by Z. • E.g., z u v z z

  20. d-separation Along Paths • For paths involving > 1 intermediate node, thepath is d-separated if the outer two nodes of anytriple are d-separated. u z v u z v u z v u z v d separated u z v z u v d separated u v Not d separated z z z z z

  21. MINVOLSET KINKEDTUBE PULMEMBOLUS INTUBATION VENTMACH DISCONNECT PAP SHUNT VENTLUNG VENITUBE PRESS MINOVL FIO2 VENTALV PVSAT ANAPHYLAXIS ARTCO2 EXPCO2 SAO2 TPR INSUFFANESTH HYPOVOLEMIA LVFAILURE CATECHOL LVEDVOLUME STROEVOLUME ERRCAUTER HR ERRBLOWOUTPUT HISTORY CO CVP PCWP HREKG HRSAT HRBP BP

  22. MINVOLSET KINKEDTUBE PULMEMBOLUS INTUBATION VENTMACH DISCONNECT PAP SHUNT VENTLUNG VENITUBE PRESS MINOVL FIO2 VENTALV PVSAT ANAPHYLAXIS ARTCO2 EXPCO2 SAO2 TPR INSUFFANESTH HYPOVOLEMIA LVFAILURE CATECHOL LVEDVOLUME STROEVOLUME ERRCAUTER HR ERRBLOWOUTPUT HISTORY CO CVP PCWP HREKG HRSAT HRBP BP

  23. Sufficiency For Conditional Independence: Markov Blanket The Markov blanket of node u consists of the parents, children, and children’s parents of u P(u|MB(u),v) = P(u|MB(u)) u

  24. Probabilistic Models Probabilistic models Graphical models Directed Undirected (Bayesian belief nets) (Markov nets) Alarm network State-space models HMMs Naïve Bayes classifier PCA/ ICA Markov Random Field Boltzmann machine Ising model Max-ent model Log-linear models

  25. Turning A Directed Graphical Model Into An Undirected Model Via Moralization • Moralization: connect all parents of each node and remove arrows

  26. Toy Example Of A Markov Net X1 X2 Clique: (largest) subset of vertices such that each pair is connected by an edge X3 X4 X5 Xi? Xrest| Xnbrs e.g., X1?X4, X5 | X2, X3 Potential function 3 1 2 3 Partition function Clique

  27. A Real Markov Net Observed pixels Latent causes • Estimate P(x1, …, xn | y1, …, yn) • Ψ(xi, yi) = P(yi | xi): local evidence likelihood • Ψ(xi, xj) = exp(-J(xi, xj)): compatibility matrix

  28. Example Of Image Segmentation With MRFs Sziranyi et al. (2000)

  29. Graphical Models Are A Useful Formalism • E.g., Naïve Bayes model D Rx Ry Bx By Marginalizing over D Definition of conditional probability

  30. Graphical Models Are A Useful Formalism • E.g., feedforward neural net • with noise, sigmoid belief net Output layer Hidden layer Input layer

  31. Graphical Models Are A Useful Formalism • E.g., Restricted Boltzmann machine (Hinton) • Also known as Harmony network (Smolensky) Hidden units Visible units

  32. Graphical Models Are A Useful Formalism • E.g., Gaussian Mixture Model

  33. Graphical Models Are A Useful Formalism • E.g., dynamical (time varying) models in which data arrives sequentially or output is produced as a sequence • Dynamic Bayes nets (DBNs) can be used to model such time-series (sequence) data • Special cases of DBNs include • Hidden Markov Models (HMMs) • State-space models

  34. X1 X2 X3 Y1 Y3 Y2 Hidden Markov Model (HMM) Phones/ words acoustic signal transitionmatrix Gaussianobservations

  35. X1 X2 X3 Y1 Y3 Y2 State-Space Model (SSM)/Linear Dynamical System (LDS) “True” state Noisy observations

  36. X2 X1 X1 X2 Q3 Q1 Q2 y2 y1 y1 y2 R1 R2 R3 o o X1 X2 o o y1 y2 Example: LDS For 2D Tracking sparse linear-Gaussian system

  37. X3 X1 X2 Y1 Y3 Y2 KalmanFiltering(Recursive State Estimation In An LDS) • Estimate P(Xt|y1:t) from P(Xt-1|y1:t-1) and yt • Predict: P(Xt|y1:t-1) = sXt-1 P(Xt|Xt-1) P(Xt-1|y1:t-1) • Update: P(Xt|y1:t) / P(yt|Xt) P(Xt|y1:t-1)

  38. Mike’s Project of the Week IRT model problem δ α G X P trial student

  39. Mike’s Project of the Week BKT model G S L0 τ X T trial student

  40. Mike’s Project of the Week IRT+BKT model problem δ γ σ α η G S L0 τ X P T trial student

  41. Why Are Bayes Nets Useful? • Factored representation may have exponentially fewer parameters than full joint • Lower time complexity (i.e., easier inference) • Lower sample complexity (i.e., less data for learning) • Graph structure supports • Modular representation of knowledge • Local, distributed algorithms for inference and learning • Intuitive (possibly causal) interpretation • Strong theory about the nature of cognition or the generative process that produces observed data • Can’t represent arbitrary contingencies among variables, so theory can be rejected by data

More Related