Probabilistic Inference

Probabilistic Inference Reading: Chapter 13 Next time: How should we define artificial intelligence? Reading for next time (see Links, Reading for Retrospective Class): Turing paper Mind, Brain and Behavior, John Searle Prepare discussion points by midnight, wed night (see end of slides)

Transition to empirical AI • Add in • Ability to infer new facts from old • Ability to generalize • Ability to learn based on past observation • Key: • Observation of the world • Best decision given what is known

Overview of Probabilistic Inference • Some terminology • Inference by enumeration • Bayesian Networks

Probability Basics • Sample space • Atomic event • Probability model • An event A

Random Variables • Random variable • Probability for a random variable

Logical Propositions and Probability • Proposition = event (set of sample points) • Given Boolean random variables A and B: • Event a = set of sample points where A(ω)=true • Event ⌐a=set of sample points where A(ω)=false • Event aΛb=points where A(ω)=true and B(ω)=true • Often the sample space is the Cartesian product of the range of variables • Proposition=disjunction of atomic events in which it is true • (aVb) = (⌐aΛb)V(aΛ⌐b)V(aΛb)P(aVb)= P(⌐aΛb)+P(aΛ⌐b)+P(aΛb)

Axioms of Probability • All probabilities are between 0 and 1 • Necessarily true propositions have probability 1. Necessarily false propositions have probability 0 • The probability of a disjunction is • P(aVb)=P(a)+P(b)-P(aΛb) • P(⌐a)=1-p(a)

The definitions imply that certain logically related events must have related probabilitiesP(aVb)= P(a)+P(b)-P(aΛb)

Prior Probability • Prior or unconditional probabilities of propositions • P(female=true)=.5 corresponds to belief prior to arrival of any new evidence • Probability distribution gives values for all possible assignments • P(color) = (color = green, color=blue, color=purple) • P(color)=<.6,.3,.1> (normalized: sums to 1) • Joint probability distribution for a set of r.v.s gives the probability of every atomic event on those r.v.s (i.e., every sample point) • P(color,gender) = a 3X2 matrix

Inference by enumeration • Start with the joint distribution

Inference by enumeration • P(HasTeeth)=.06+.12+.02=.2

Inference by enumeration • P(HasTeethVColor=Green)=.06+.12+.02+.24=.44

Conditional Probability • Conditional or posterior probabilities • E.g., P(PlayerWins|HostOpenDoor=1 and PlayerPickDoor2 and Door1=goat) = .5 • If we know more (e.g., HostOpenDoor=3 and door3-goat):P(PlayerWins)=1Note: the less specific belief remains valid after more evidence arrives, but is not always useful • New evidence may be irrelevant, allowing simplification: • P(PlayerWins|California-earthquake)=P(PlayerWins)=.3

Conditional Probability A general version holds for joint distributions: P(PlayerWins,HostOpensDoor1)=P(PlayerWins|HostOpensDoor1)*P(HostOpensDoor1)

Inference by enumeration • Compute conditional probabilities: • P(⌐Hasteeth|color=green)= P(⌐HasteethΛcolor=green) P(color=green)0.8 = 0.24 0.06+.24

Normalization • Denominator can be viewed as normalization constraint α • P(⌐Hasteeth|color=green) = α P(⌐Hasteeth|color=green)=α[P(⌐Hasteeth,color=green, female)+ P(⌐Hasteeth,color=green, ⌐ female)]=α[<0.03,0.12>+<0.03,0.012>]=α<0.06,0.24>=<0.2,0.8> • Compute distribution on query variable by fixing evidence variablesand summing over hidden variables

Inference by enumeration

Independence • A and B are independent iffP(A|B)=P(A) or P(B|A)=P(B) or P(A,B)=P(A)P(B) • 32 entries reduced to 12; for n independent biased coins, 2n -> n • Absolute independence powerful but rare • Any domain is large with hundreds of variables none of which are independent

Conditional Independence • If I have length <=.2, the probability that I am female doesn’t depend on whether or not I have teeth: P(female|length<=.2,hasteeth)=P(female|hasteeth) • The same independence holds if I am >.2 • P(male|length>.2,hasteeth)=P(male|length>.2) • Gender is conditionally independent of hasteeth given length

In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n • Conditional independence is our most basic and robust form of knowledge about uncertain environments

Next Class: Turing Paper • A discussion class • Graduate students and non-degree students: Anyone beyond a bachelor’s: • Prepare a short statement on the paper. Can be your reaction, your position, a place where you disagree, an explication of a point. • Undergraduates: Be prepared with questions for the graduate students • All: Submit your statement or your question by midnight Wed night. • All statements and questions will be printed and distributed in class on Wednesday.

Probabilistic Inference