1 / 62

Introduction to Bayesian Belief Nets

Introduction to Bayesian Belief Nets. Russ Greiner Dep’t of Computing Science Alberta Ingenuity Centre for Machine Learning University of Alberta http://www.cs.ualberta.ca/~greiner/bn.html. Motivation. Gates says [LATimes, 28/Oct/96]:

guang
Télécharger la présentation

Introduction to Bayesian Belief Nets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction toBayesian Belief Nets Russ Greiner Dep’t of Computing Science Alberta Ingenuity Centre for Machine Learning University of Alberta http://www.cs.ualberta.ca/~greiner/bn.html

  2. Motivation • Gates says [LATimes, 28/Oct/96]: Microsoft’s competitive advantages is its expertise in “Bayesian networks” • Current Products • Microsoft Pregnancy and Child Care (MSN) • Answer Wizard (Office 95, Office 2000) • Print Troubleshooter Excel Workbook Troubleshooter Office 95 Setup Media Troubleshooter Windows NT 4.0 Video Troubleshooter Word Mail Merge Troubleshooter

  3. Motivation (II) • US Army:SAIP(Battalion Detection from SAR, IR… GulfWar) • NASA: Vista(DSS for Space Shuttle) • GE: Gems(real-time monitor for utility generators) • Intel: (infer possible processing problems from end-of-line tests on semiconductor chips) • KIC: • medical: sleep disorders, pathology, trauma care, hand and wrist evaluations, dermatology, home-based health evaluations • DSS for capital equipment: locomotives, gas-turbine engines, office equipment

  4. Motivation (III) • Lymph-node pathology diagnosis • Manufacturing control • Software diagnosis • Information retrieval • Types of tasks • Classification/Regression • Sensor Fusion • Prediction/Forecasting

  5. Outline • Existing uses of Belief Nets (BNs) • How to reason with BNs • Specific Examples of BNs • Contrast with Rules, Neural Nets, … • Possible applications of BNs • Challenges • How to reason efficiently • How to learn BNs

  6. Symptoms Blah blah ouch yak ouch blah ouch blah blah ouch blah Chief complaint History, … Signs Physical Exam Test results, … Plan Diagnosis Treatment, …

  7. Objectives: Decision Support System • Determine • which tests to perform • which repairto suggest based on costs, sensitivity/specificity, … • Use all sources of information • symbolic (discrete observations, history, …) • signal(from sensors) • Handle partial information • Adapt to track fault distribution

  8. Underlying Task • Approach1: Use set of obs1 & … & obsm  Dxirules but… Need rule for each situation • for each diagnosis Dxr • for each set of possible values vj for Oj • for each subset of obs. {Ox1, Ox2, … }  {Oj} • Can’t use if only know Temp and BP If Temp>100 & BP = High & Cough = Yes  DiseaseX • Situation: Given observations {O1=v1, … Ok=vk} (symptoms, history, test results, …) what is best DIAGNOSIS Dxi for patient? • Seldom Completely Certain

  9. Underlying Task, II • Approach 2: Compute Probabilities of Dxi given observations { obsj } P( Dx = u | O1= v1, …, Ok= vk ) • Situation: Given observations {O1=v1, … Ok=vk} (symptoms, history, test results, …) what is best DIAGNOSIS Dxi for patient? • Challenge: How to express Probabilities?

  10. How to deal with Probabilities P( Dx=T, O1=T, O2=T, …, ON=T ) = 0.03 P( Dx=T, O1=T, O2=T, …, ON=F ) = 0.4 … P( Dx=T, O1=F, O2=F, … , ON=T ) = 0 … P( Dx=F, O1=F, O2=F, …, ON=F ) = 0.01  • Then:Marginalize: P( Dx = u, O1= v1,…,Ok= vk ) = ΣP( Dx = u , O1= v1 , …, Ok= vk, …, ON= vN ) Conditionalize: P( Dx = u | O1 = v1,…, Ok = vk) P( Dx = u, O1 = v1,…,Ok = vk ) P( O1 = v1,…,Ok = vk) P( Dx = u, O1=v1,..., Ok= vk,…, ON=vN ) • Sufficient: “atomic events”: for all 21+N values u {T, F}, vj{T, F} • But… even if binary Dx, 20 binary obs.’s.  >2,097,000 numbers!

  11. Problems with “Atomic Events” Belief Nets  • Representation is not intuitive  Should make “connections” explicit use “local information” P(Jaundice | Hepatitis), P(LightDim | BadBattery),… • Too many numbers – O(2N) • Hard to store • Hard to use • [Must add 2r values to marginalize r variables] • Hard to learn • [Takes O(2N) samples to learn 2N parameters]  Include only necessary “connections”

  12. ? Hepatitis? ? Hepatitis, not Jaunticed but +BloodTest ? Jaunticed BloodTest

  13. Hepatitis Example • J B H P(J, B, H) • 0 0 0 0.03395 • 0 0 1 0.0095 • 0 1 0 0.0003 • 0 1 1 0.1805 • 1 0 0 0.01455 • 0 1 0.038 • 1 0 0.00045 • 1 1 1 0.722 Option 1: …Marginalize/Conditionalize, to get P( H=1 | J=0, B=1 ) … H Hepatitis J Jaundice B (positive) Blood test • (Boolean) Variables: • Want P( H=1 | J=0, B=1 ) • …, P(H=1 | B=1, J=1), P(H=1| B=0,J=0), … • Alternatively…

  14. Encoding Causal Links P(H=1) P(H=0) H 0.05 0.95 h P(B=1 | H=h) P(B=0 | H=h) B 1 0.95 0.05 0 0.03 0.97 h b P(J=1|h,b) P(J=0|h,b) J 1 1 0.8 0.2 1 0 0.8 0.2 0 1 0.3 0.7 0 0 0.3 0.7 • Simple Belief Net: • Node ~ Variable Link ~ “Causal dependency” • “CPTable” ~ P(child | parents)

  15. Encoding Causal Links H B J • P(J | H, B=0) = P(J | H, B=1)  J, H !P( J | H, B) = P(J | H) • J is INDEPENDENT of B, once we know H • Don’t need B J arc!

  16. Encoding Causal Links H B J • P(J | H, B=0) = P(J | H, B=1)  J, H !P( J | H, B) = P(J | H) • J is INDEPENDENT of B, once we know H • Don’t need B J arc!

  17. Encoding Causal Links H B J • P(J | H, B=0) = P(J | H, B=1)  J, H !P( J | H, B) = P(J | H) • J is INDEPENDENT of B, once we know H • Don’t need B J arc!

  18. Sufficient Belief Net H B J P(B=1 | H=1) Hence: P(H=1 | J=0, B=1) = P(H=1) P(J=0 | H=1) P(B=1 | J=0,H=1) • Requires: P(H=1) known • P(J=1 | H=1) known • P(B=1 | H=1) known • (Only 5 parameters, not 7)

  19. “Factoring” H • but… ONLY THROUGH H: • If know H=1, then likely that B=1 • … doesn’t matter whether J=1 or J=0 ! •  B J P(B=1 | J=0, H=1) = P(B=1 | H=1) • Bdoes depend on J: • If J=1, then likely that H=1 B =1 N.b., BandJ ARE correlated a priori P(B | J )  P(B) GIVEN H, they become uncorrelatedP(B | J, H) = P(B | H)

  20. Factored Distribution H Hepatitis J Jaundice B (positive) Blood test P( B | J ) P ( B ) but P( B | J,H ) = P ( B | H ) Age ShoeSize Reading • Symptoms independent, given Disease • ReadingAbility and ShoeSize are dependent, P(ReadAbility | ShoeSize ) P(ReadAbility ) • but become independent, given Age • P(ReadAbility | ShoeSize, Age ) = P(ReadAbility | Age)

  21. “Naïve Bayes” P(H = hi ) P(Oj = vj | H = hi) Independent: P(Oj | H, Ok,…) = P(Oj | H) H • Given ... O1 O2 On • Classification Task: Given {O1 = v1, …, On = vn } Findhi that maximizes (H = hi | O1 = v1, …, On = vn) • Find argmax {hi}

  22. Naïve Bayes (con’t) • If kDx’s and nOis, • requires only k priors, n * k pairwise-conditionals • (Not 2n+k… relatively easy to learn) n 1+2n 2n+1 – 1 10 21 2,047 H 30 61 2,147,438,647 ... O1 O2 On • Normalizing term (No need to compute, assame for allhi) • Easy to use for Classification • Can use even if some vjs not specified

  23. Bigger Networks GeneticPH LiverTrauma Hepatitis Jaundice Bloodtest • If GeneticPH, then expect Jaundice: GeneticPH Hepatitis Jaundice • But only via Hepatitis: GeneticPH and not Hepatitis Jaundice • P( J | D ) P( J ) but • P( J | D,H ) = P( J | H) • Intuition: Show CAUSAL connections: GeneticPH CAUSES Hepatitis; Hepatitis CAUSES Jaundice

  24. Belief Nets D I • Given H = 1, • - D has no influence on J • - J has no influence on B • - etc. H J B • DAG structure • Each node  Variable v • v depends (only) on its parents + conditional prob: P(vi | parenti = 0,1,… ) • vis INDEPENDENT of non-descendants, given assignments to its parents

  25. Less Trivial Situations P(FHD=1) FHD 0.001 f P(MS=1 | FHD=f) 1 0.10 MS f m P(D=1 | FHD=f, MS=m) 0 0.03 1 1 0.97 1 0 0.90 D 0 1 0.08 0 0 0.04 • N.b., obs1 is not always independent of obs2 given H • Eg, FamilyHistoryDepression ‘causes’ MotherSuicide and Depression MotherSuicide causes Depression (w/ or w/o F.H.Depression) • Here, P( D | MS, FHD ) P( D | FHD ) ! • Can be done using Belief Network, • but need to specify: • P( FHD ) 1 • P( MS | FHD ) 2 • P( D | MS, FHD ) 4

  26. Example: Car Diagnosis

  27. MammoNet

  28. ALARM • A Logical Alarm Reduction Mechanism • 8 diagnoses, 16 findings, …

  29. Troup Detection

  30. ARCO1: Forecasting Oil Prices

  31. ARCO1: Forecasting Oil Prices

  32. Forecasting Potato Production

  33. Warning System

  34. Extensions • Find best values (posterior distr.) for SEVERAL (> 1) “output” variables • Partial specification of “input” values • onlysubset of variables • only “distribution” of each input variable • General Variables • Discrete, but domain > 2 • Continuous (Gaussian: x = i bi yifor parents {Y} ) • Decision TheoryDecision Nets(Influence Diagrams) Making Decisions, not just assigning prob’s • Storing P( v | p1, p2,…,pk)General “CP Tables” 0(2k) Noisy-Or, Noisy-And, Noisy-Max “Decision Trees”

  35. Outline • Existing uses of Belief Nets (BNs) • How to reason with BNs • Specific Examples of BNs • Contrastwith Rules, Neural Nets, … • Possible applications of BNs • Challenges • How to reason efficiently • How to learn BNs

  36. Belief Nets vs Rules • Oftensame nodes(rep’ning Propositions) but BN: Cause  Effect “Hep  Jaundice” P(J | H ) Rule: Effect  Cause “Jaundice  Hep” • Both have “Locality” Specific clusters (rules / connected nodes) WHY?: Easier for people to reason CAUSALLY even if use is DIAGNOSTIC • BN provide OPTIMAL way to deal with • + Uncertainty • + Vagueness (var not given, or only dist) • + Error • …Signals meeting Symbols … • BN permits different “direction”s of inference

  37. Belief Nets vs Neural Nets • Both have “graph structure” but BN: Nodes have SEMANTICs Combination Rules: Sound Probability NN: Nodes: arbitrary Combination Rules: Arbitrary • So harder to • Initialize NN • Explain NN (But perhaps easier to learn NN from examples only?) • BNs can deal with • Partial Information • Different “direction”s of inference

  38. Belief Nets vs Markov Nets • Technical: BNs use DIRECTRED arcs •  allow “induced dependencies” • I (A, {}, B) “A independent of B, given {}” • ¬ I (A, C, B) “A dependent on B, given C” A B C A • MNs use UNDIRECTED arcs •  allow other independencies • I(A, BC, D) A independent of D, given B, C • I(B, AD, C) B independent of C, given A, D B C D • Each uses “graph structure” to FACTOR a distribution … explicitly specify dependencies, implicitly independencies… • but subtle differences… • BNs capture “causality”, “hierarchies” • MNs capture “temporality”

  39. Uses of Belief Nets #1 • Medical Diagnosis: “Assist/Critique” MD • identify diseases not ruled-out • specify additional tests to perform • suggest treatments appropriate/cost-effective • react to MD’s proposed treatment • Decision Support: Find/repair faults in complex machines [Device, or Manufacturing Plant, or …] … based on sensors, recorded info, history,… • Preventative Maintenance: Anticipate problems in complex machines [Device, or Manufacturing Plant, or …] …based on sensors, statistics, recorded info, device history,…

  40. Uses (con’t) • Logistics Support: Stock warehouses appropriately…based on (estimated) freq. of needs, costs, • Diagnose Software:Find most probable bugs, given program behavior, core dump, source code, … • Part Inspection/Classification:… based on multiple sensors, background, model of production,… • Information Retrieval: Combine information from various sources, based on info from various “agents”,… General: Partial Info, Sensor fusion -Classification -Interpretation -Prediction -…

  41. Challenge #1Computational Efficiency • For given BN: • General problem is • Given • Compute • + If BN is “poly tree”,  efficient alg. • - If BN is gen’l DAG (>1 path from X to Y) • - NP-hard in theory • - slow in practice • Tricks: Get approximate answer (quickly) • + Use abstraction of BN • + Use “abstraction” of query (range) O1 = v1, …, On = vn D I P(H | O1 = v1, …, On = vn) H J B

  42. # 2a:Obtaining Accurate BN • BN encodes distribution over n variables • Not O(2n) values, but “only” i 2k_i • (Node ni binary, with kiparents) • Still lots of values! …structure .. • Qualitative Information • Structure: “What depends on what?” • Easy for people (background knowledge) • But NP-hard to learn from samples… • Quantitative Information • Actual CP-tables • Easy to learn, given lots of examples. • But people have hard time… Knowledge acquisition: from human experts Simple learning algorithm

  43. Notes on Learning structure values • Just Learning Algorithm: algorithms that • learn from sample • Mixed Sources: Person provides structure; Algorithm fills-in numbers. • Just Human Expert: People produce CP-table, as well as structure • Relatively few values really required • Esp. if NoisyOr, NoisyAnd, NaiveBayes, … • Actual values not that important • …Sensitivity studies

  44. My Current Work • Learning Belief Nets • Model selection: Challenging myth that MDL is appropriate criteria • Learning “performance system”, not model • Validating Belief Nets “Error bars” around answers • Adaptive User Interfaces • Efficient Vision Systems • Foundations of Learnability • Learning Active Classifiers • Sequential learners • Condition Based maintenance, Bio-signal interpretation, …

  45. # 2b: Maintaining Accurate BN • The world changes. Information in BN*may be • perfect at time t • sub-optimal at time t + 20 • worthless at time t + 200 • Need to MAINTAIN a BN over time using on-going human consultant • Adaptive BN • Dirichlet distribution (variables) • Priors over BNs

  46. Conclusions • Belief Nets are PROVEN TECHNOLOGY • Medical Diagnosis • DSS for complex machines • Forecasting, Modeling, InfoRetrieval… • Provide effective way to • Represent complicated, inter-related events • Reason about such situations • Diagnosis, Explanation, ValueOfInfo • Explain conclusions • Mix Symbolic and Numeric observations • Challenges • Efficient ways to use BNs • Ways to create BNs • Ways to maintain BNs • Reason about time

  47. Extra Slides • AI Seminar • Friday, noon, CSC3-33 • Free PIZZA! • http://www.cs.ualberta.ca/~ai/ai-seminar.html • References http://www.cs.ualberta.ca/~greiner/bn.html • Crusher Controller • Formal Framework • Decision Nets • Developing the Model • Why Reasoning is Hard • Learning Accurate Belief Nets

  48. References • http://www.cs.ualberta.ca/~greiner/bn.html • Overview textbooks: • Judea Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, 1988. • Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, Prentice Hall, 1995. (See esp Ch 14, 15, 19.) • General info re BayesNets • http://www.afit.af.mil:80/Schools/EN/ENG/LABS/AI/BayesianNetworks • Proceedings: http://www.sis.pitt.edu/~dsl/uai.html • Assoc for Uncertainty in AI http://www.auai.org/ • Learning: • David Heckerman, A tutorial on learning with Bayesian networks, 1995, • http://www.research.microsoft.com/research/dtg/heckerma/TR-95-06.htm • Software: • General: http://bayes.stat.washington.edu/almond/belief.html • JavaBayes http://www.cs.cmu.edu/~fgcozman/Research/JavaBaye • Norsys http://www.norsys.com/

  49. Decision Net: Test/Buy a Car

More Related