1 / 26

Probabilistic Graphical Models

Probabilistic Graphical Models. David Madigan Rutgers University madigan@stat.rutgers.edu. IF the infection is primary-bacteremia AND the site of the culture is one of the sterile sites AND the suspected portal of entry is the gastrointestinal tract

ban
Télécharger la présentation

Probabilistic Graphical Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Graphical Models David Madigan Rutgers University madigan@stat.rutgers.edu

  2. IF the infection is primary-bacteremia AND the site of the culture is one of the sterile sites AND the suspected portal of entry is the gastrointestinal tract THEN there is suggestive evidence (0.7) that infection is bacteroid. Expert Systems • Explosion of interest in “Expert Systems” in the early 1980’s • Many companies (Teknowledge, IntelliCorp, Inference, etc.), many IPO’s, much media hype • Ad-hoc uncertainty handling

  3. Uncertainty in Expert Systems If A then C (p1) If B then C (p2) What if both A and B true? Then C true with CF: p1 + (p2 X (1- p1)) “Currently fashionable ad-hoc mumbo jumbo” A.F.M. Smith

  4. Eschewed Probabilistic Approach • Computationally intractable • Inscrutable • Requires vast amounts of data/elicitation e.g., for n dichotomous variables need 2n - 1 probabilities to fully specify the joint distribution

  5. Conditional Independence X Y | Z

  6. Conditional Independence • Suppose A and B are marginally independent. Pr(A), Pr(B), Pr(C|AB) X 4 = 6 probabilities • Suppose A and C are conditionally independent given B: Pr(A), Pr(B|A) X 2, Pr(C|B) X 2 = 5 • Chain with 50 variables requires 99 probabilities versus 2100-1 A B C C A | B

  7. Properties of Conditional Independence (Dawid, 1980) For any probability measure P and random variables A, B, and C: CI 1: A B [P]  B A [P] CI 2: A B  C [P]  A B [P] CI 3: A B C [P]  A B | C[P] CI 4: A B and A C | B [P]  A B C[P] Some probability measures also satisfy: CI 5: A B | C and A C | B [P]  A B C[P] CI5 satisfied whenever P has a positive joint probability density with respect to some product measure

  8. Markov Properties for Undirected Graphs (Global) S separates A from B  A B | S (Local) a V \ cl(a) | bd (a) (Pairwise) a b | V \ {a,b} (G)  (L)  (P) B E, D | A, C (1) A B  B D | A, C, E (2) C E To go from (2) to (1) need E B | A,C? or CI5 D Lauritzen, Dawid, Larsen & Leimer (1990)

  9. Factorizations A density f is said to “factorize according to G” if: f(x) =  C(xC) C C “clique potentials” • cliques are maximally complete subgraphs Proposition: If f factorizes according to a UG G, then it also obeys the global Markov property “Proof”: Let S separate A from B in G and assume Let CA be the set of cliques with non-empty intersection with A. Since S separates A from B, we must have for all C in CA. Then:

  10. A Markov Properties for Acyclic Directed Graphs (Bayesian Networks) (Global) S separates A from B in Gan(A,B,S)m A B | S (Local) a nd(a)\pa(a) | pa (a) (G)  (L) A B B S S Lauritzen, Dawid, Larsen & Leimer (1990)

  11. Factorizations A density f admits a “recursive factorization” according to an ADG G if f(x) =  f(xv | xpa(v) ) ADG Global Markov Property  f(x) =  f(xv | xpa(v) ) v V Lemma: If P admits a recursive factorization according to an ADG G, then P factorizes according GM (and chordal supergraphs of GM) Lemma: If P admits a recursive factorization according to an ADG G, and A is an ancestral set in G, then PA admits a recursive factorization according to the subgraph GA

  12. Markov Properties for Acyclic Directed Graphs (Bayesian Networks) (Global) S separates A from B in Gan(A,B,S)m A B | S (Local) a nd(a)\pa(a) | pa (a) •  nd(a) is an ancestral set; pa(a) obviously separates a from nd(a)\pa(a) in Gan(and(a))m (G)  (L) (L)  (factorization) induction on the number of vertices

  13. d-separation A chain p from a to b in an acyclic directed graph G is said to be blockedbyS if it contains a vertex gp such that either: - g S and arrows of p do not meet head to head at g, or - g S nor has g any descendents in S, and arrows of p do meet head to head at g Two subsets A and B are d-separated by S if all chains from A to B are blocked by S

  14. d-separation and global markov property Let A, B, and S be disjoint subsets of a directed, acyclic graph, G. Then S d-separates A from B if and only if S separates A from B in Gan(A,B,S)m

  15. UG – ADG Intersection A B C C A | B A D A A B C B A C | B C B C A B | C,D C D | A,B A C A B C A C | B A B C A C | B

  16. UG – ADG Intersection UG ADG Decomposable • UG is decomposable if chordal • ADG is decomposable if moral • Decomposable ~ closed-form log-linear models No CI5

  17. Chordal Graphs and RIP • Chordal graphs (uniquely) admit clique orderings that have the Running Intersection Property • {V,T} • {A,L,T} • {L,A,B} • {S,L,B} • {A,B,D} • {A,X} V T L A S X D B • The intersection of each set with those earlier in the list is fully contained in previous set • Can compute cond. probabilities (e.g. Pr(X|V)) by message passing (Lauritzen & Spiegelhalter, Dawid, Jensen)

  18. Probabilistic Expert System • Computationally intractable • Inscrutable • Requires vast amounts of data/elicitation • Chordal UG models facilitate fast inference • ADG models better for expert system applications – more natural to specify Pr( v | pa(v) )

  19. Factorizations UG Global Markov Property  f(x) =  C(xC) C C ADG Global Markov Property  f(x) =  f(xv | xpa(v) ) v V

  20. A Lauritzen-Spiegelhalter Algorithm A • (C,S,D)  Pr(S|C, D) • (A,E)  Pr(E|A) Pr(A) • (C,E)  Pr(C|E) • (F,D,B)  Pr(D|F)Pr(B|F)Pr(F) • (D,B,S)  1 (B,S,G)  Pr(G|S,B) (H,S)  Pr(H|S) E F E F D C D C B B S S H H G G • Moralize • Triangulate Algorithm is widely deployed in commercial software

  21. L&S Toy Example Pr(C|B)=0.2 Pr(C|¬B)=0.6 Pr(B|A)=0.5 Pr(B|¬A)=0.1 Pr(A)=0.7 A B C • (A,B)  Pr(B|A)Pr(A) • (B,C)  Pr(C|B) A B C ¬B ¬C B C ¬B B A 0.35 0.35 B 0.2 0.8 AB B BC 1 1 ¬A 0.03 0.27 ¬B 0.6 0.4 Pr(A|C) Message Schedule: AB BC BC AB ¬C C ¬C C ¬B B B 0.076 0 B 0.076 0.304 0.38 0.62 ¬B 0.372 0 ¬B 0.372 0.248

  22. Other Theoretical Developments Do the UG and ADG global Markov properties identify all the conditional independences implied by the corresponding factorizations? Yes. Completeness for ADGs by Geiger and Pearl (1988); for UGs by Frydenberg (1988) Graphical characterization of collapsibility in hierarchical log-linear models (Asmussen and Edwards, 1983)

  23. Bayesian Learning for ADG’s • Example: three binary variables • Five parameters:

  24. Local and Global Independence

  25. Bayesian learning Consider a particular state pa(v)+ of pa(v)

More Related