Bayesian Belief Networks Chapter 2 (Duda et al.) – Section 2.11

Bayesian Belief NetworksChapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern RecognitionDr. George Bebis

Statistical Dependences Between Variables • Many times, the only knowledge we have about a distribution is which variables areorare not dependent. • Such dependencies can be represented efficiently using a Bayesian Belief Network (or Belief Net or Bayesian Net). • Bayesian Nets allow us to represent a joint probability density p(x,y,z,…) efficiently using dependency relationships.

Example of Dependencies • State of an automobile • Engine temperature • Brake fluid pressure • Tire air pressure • Wire voltages • Causally related variables • Coolant temperature • Engine temperature • NOT causally related variables • Engine oil pressure • Tire air pressure

Applications • Microsoft (Answer Wizard, Print Troubleshooter) • US Army: SAIP (Battalion Detection from SAR, IR etc.) • NASA: Vista (DSS for Space Shuttle) • GE: Gems (real-time monitoring of utility generators)

Definitions and Notation • A belief net is usually a Directed Acyclic Graph (DAG) • Each node represents one of the system variables. • Each variable can assume certain states (i.e., values).

Relationships Between Nodes • A link joining two nodes is directional and represents a causal influence (e.g., X depends on A or A influences X) • Influences could be direct or indirect (e.g., A influences X directly and A influences C indirectly through X).

Conditional Probability Tables • Each variable is associated with a set of weights which represent prior or conditional probabilities (discrete or continuous) . Probabilities sum to 1

Parent/Children Nodes • Parent nodes P of X • the nodes before X (connected to X) • Children nodes C of X: • the nodes after X (X is connected to them) Markov property: each node is conditionally independent of its ancestors given its parents.

Computing Joint Probabilities • Using the chain rule, the joint probability of a set of variables x1, x2, …, xn is give as: • The conditional independence relationships encoded in the Bayesian network state that a node xi is conditionally independent of its ancestors given its parents πi : = much simpler!

Computing Joint Probabilities (cont’d) • We can compute the probability of any configuration of variables in the joint density distribution, e.g.: P(a3, b1, x2, c3, d2)=P(a3)P(b1)P(x2 /a3,b1)P(c3 /x2)P(d2 /x2)= 0.25 x 0.6 x 0.4 x 0.5 x 0.4 = 0.012

Computing the Probability at a Node • e.g., determine the probability at D

Computing the Probability at a Node (cont’d) • e.g., determine the probability at H: =

Two Fundamental Problems in Bayesian Nets • Evaluation (inference) problem: Given the model and the values of the observed variables (evidence), estimate the values of the hidden nodes. • Learning problem: Given training data and prior information (e.g., expert knowledge, causal relationships), estimate the network structure, or the parameters of the probability distributions, or both.

Evaluation (Inference) Problem • In general, if X denotes the query variables and e denotes the evidence, then where α=1/P(e) is a constant of proportionality.

Evaluation (Inference) Problem (cont’d) • Prediction, or top-down reasoning: observing the “roots” and trying to predict the effects. • Diagnosis, or bottom-up reasoning: observing the “leaves” and trying to infer the values of the hidden causes. • Exact inference is an NP-hard problem because the number of terms in the summations (integrals) for discrete (continuous) variables grows exponentially with increasing number of variables.

Evaluation (Inference) Problem (cont’d) • Some restricted classes of networks (e.g., singly connected networks where there is no more than one path between any two nodes) can be efficiently solved in time linear in the number of nodes. • However, approximate inference methods such as • sampling (Monte Carlo) methods • variational methods • loopy belief propagation have to be used for most of the cases.

Example • Classify a fish given that the fish is light (c1) and was caught in south Atlantic (b2) -- no evidence about what time of the year the fish was caught nor its thickness.

Example (cont’d)

Example (cont’d) • Similarly, P(x2 / c1,b2)=α 0.066 • Normalize probabilities (not needed necessarily): P(x1 /c1,b2)+ P(x2 /c1,b2)=1 (α=1/0.18) P(x1 /c1,b2)= 0.73 P(x2 /c1,b2)= 0.27 salmon

Another Example • You have a new burglar alarm installed at home. • It is fairly reliable at detecting burglary, but also sometimes responds to minor earthquakes. • You have two neighbors, Ali and Veli, who promised to call you at work when they hear the alarm.

Another Example (cont’d) • Ali always calls when he hears the alarm, but sometimes confuses telephone ringing with the alarm and calls too. • Veli likes loud music and sometimes misses the alarm. • Given the evidence of who has or has not called, we would like to estimate the probability of a burglary.

Another Example (cont’d) • Design the structure of the Bayesian net. • What are the main variables? • Alarm, Burglary, Earthquake, Ali calls, Veli calls • What are the conditional dependencies among them? • Burglary (B) and earthquake (E) directly affect the probability of the alarm (A) going off • Whether or not Ali calls (AC) or Veli calls (VC) depends only on the alarm.

Another Example (cont’d)

Another Example (cont’d) • What is the probability that the alarm has sounded but neither a burglary nor an earthquake has occurred, and both Ali and Veli call?

Another Example (cont’d) • What is the probability that there is a burglary given that Ali calls? • What about if Veli also calls right after Ali hangs up?

Medical Diagnosis Example Uppermost nodes:biological agents (bacteria, virus) Intermediate nodes:diseases Lowermost nodes:symptoms • Given some evidence (biological agents, symptoms), find most likely disease.

Learning Problem • The simplest situation is the one where the network structure is completely known (either specified by an expert or designed using causal relationships between the variables). • Other situations with increasing complexity are: • known structure but unobserved variables • unknown structure with observed variables, and • unknown structure with unobserved variables. Four cases in Bayesian network learning.

Naïve Bayesian Network • When dependency relationships among features are unknown, we can assume that features are conditionally independent given the class: • Simple assumption but usually works well in practice! Naïve Bayesian Network:

Bayesian Belief Networks Chapter 2 (Duda et al.) – Section 2.11