340 likes | 484 Vues
Decision Making Under Uncertainty. Russell and Norvig: ch 16 CMSC421 – Fall 2006. sensors. environment. ?. agent. actuators. Utility-Based Agent. ?. ?. a. a. b. b. c. c. {a(p a ),b(p b ),c(p c )} decision that maximizes expected utility value.
E N D
Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006
sensors environment ? agent actuators Utility-Based Agent
? ? a a b b c c • {a(pa),b(pb),c(pc)} • decision that maximizes expected utility value Non-deterministic vs. Probabilistic Uncertainty • {a,b,c} • decision that is best for worst case Non-deterministic model Probabilistic model ~ Adversarial search
Expected Utility • Random variable X with n values x1,…,xn and distribution (p1,…,pn)E.g.: Xi is Resulti(A)|Do(A), E, the state reached after doing an action A given E, what we know about the current state • Function U of XE.g., U is the utility of a state • The expected utility of A is EU[A|E] = Si=1,…,n p(xi|A)U(xi) = Si=1,…,n p(Resulti(A)|Do(A),E)U(Resulti(A))
s0 A1 s1 s2 s3 0.2 0.7 0.1 100 50 70 One State/One Action Example U(S0) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1 = 20 + 35 + 7 = 62
s0 A1 A2 s1 s2 s3 s4 0.2 0.7 0.2 0.1 0.8 100 50 70 One State/Two Actions Example • U1(S0) = 62 • U2(S0) = 74 • U(S0) = max{U1(S0),U2(S0)} • = 74 80
s0 A1 A2 s1 s2 s3 s4 0.2 0.7 0.2 0.1 0.8 100 50 70 Introducing Action Costs • U1(S0) = 62 – 5 = 57 • U2(S0) = 74 – 25 = 49 • U(S0) = max{U1(S0),U2(S0)} • = 57 -5 -25 80
MEU Principle • rational agent should choose the action that maximizes agent’s expected utility • this is the basis of the field of decision theory • normative criterion for rational choice of action AI is Solved!!!
Not quite… • Must have complete model of: • Actions • Utilities • States • Even if you have a complete model, will be computationally intractable • In fact, a truly rational agent takes into account the utility of reasoning as well---bounded rationality • Nevertheless, great progress has been made in this area recently, and we are able to solve much more complex decision theoretic problems than ever before
We’ll look at • Decision Theoretic Reasoning • Simple decision making (ch. 16) • Sequential decision making (ch. 17)
Preferences • An agent chooses among prizes (A, B, etc.) and lotteries, i.e., situations with uncertain prizes • Lottery L = [p, A; (1 – p), B] • Notation: A > B: A preferred to BA B : indifference between A and BA ≥ B : B not preferred to A
Rational Preferences • Idea: preferences of a rational agent must obey constraints • Axioms of Utility Theory • Orderability: (A > B) v (B > A) v (A B) • Transitivity: (A > B) ^ (B > C) (A > C) • Contitnuity: A > B > C p [p, A; 1-p,C] B • Substitutability: A B [p, A; 1-p,C] [p, B; 1-p,C] • Monotonicity: A > B (p ≥ q [p, A; 1-p, B] ≥ [q, A; 1-q, B])
Rational Preferences • Violating the constraints leads to irrational behavior • E.g: an agent with intransitive preferences can be induced to give away all its money • if B > C, than an agent who has C would pay some amount, say $1, to get B • if A > B, then an agent who has B would pay, say, $1 to get A • if C > A, then an agent who has A would pay, say, $1 to get C • ….oh, oh!
Rational Preferences Utility • Theorem (Ramsey, 1931, von Neumann and Morgenstern, 1944): Given preferences satisfying the constraints, there exists a real-valued function U such that U(A) ≥ U(B) A ≥B U([p1,S1;…,pn,Sn])=i piU(Si) • MEU principle: Choose the action that maximizes expected utility
Utility Assessment • Standard approach to assessment of human utilites:compare a given state A to a standard lottery Lp that has best possible prize w/ prob. p worst possible catastrophy w/ prob. (1-p) • adjust lottery probability p until ALp continue as before p A Lp instant death 1 - p
Aside: Money Utility function • Given a lottery L with expected monetrary value EMV(L), • usually U(L) < U(EMV(L)) • e.g., people are risk-averse • Would you rather have $1,000,000 for sure, or a lottery with [0.5, $0; 0.5, $3,000,000]?
Decision Networks • Extend BNs to handle actions and utilities • Also called Influence diagrams • Make use of BN inference • Can do Value of Information calculations
Decision Networks cont. • Chance nodes: random variables, as in BNs • Decision nodes: actions that decision maker can take • Utility/value nodes: the utility of the outcome state.
Umbrella Network take/don’t take P(rain) = 0.4 Take Umbrella rain umbrella P(umb|take) = 1.0 P(~umb|~take)=1.0 happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25
Evaluating Decision Networks • Set the evidence variables for current state • For each possible value of the decision node: • Set decision node to that value • Calculate the posterior probability of the parent nodes of the utility node, using BN inference • Calculate the resulting utility for action • return the action with the highest utility
Umbrella Network take/don’t take P(rain) = 0.4 Take Umbrella rain umbrella P(umb|take) = 1.0 P(umb|~take)= 0 happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25
Umbrella Network take/don’t take P(rain) = 0.4 Take Umbrella rain umbrella #1 P(umb|take) = 0.8 P(umb|~take)=0.1 happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25 #1: EU(take) = 100 x .12 + -100 x 0.08 + 0 x 0.48 + -25 x .32 = ???
Umbrella Network So, in this case I would…? take/don’t take P(rain) = 0.4 Take Umbrella rain umbrella #2 P(umb|take) = 0.8 P(umb|~take)=0.1 happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25 #2: EU(~take) = 100 x .54 + -100 x 0.36 + 0 x 0.06 + -25 x .04 = ???
Value of Information • Idea: Compute the expected value of acquiring possible evidence • Example: buying oil drilling rights • Two blocks A and B, exactly one of them has oil, worth k • Prior probability 0.5 • Current price of block is k/2 • What is the value of getting a survey of A done? • Survey will say ‘oil in A’ or ‘no oil in A’ w/ prob. 0.5 • Compute expected value of information (VOI) • expected value of best action given the infromation minus expected value of best action without information • VOI(Survey) = [0.5 x value of buy A given oil in A] + [0.5 x value of buy B given no oil in A] – 0 = ??
the value of the new best action (after new evidence E’ is obtained): the value of information for E’ is: Value of Information (VOI) • suppose agent’s current knowledge is E. The value of the current best action is
Umbrella Network take/don’t take P(rain) = 0.4 Take Umbrella rain umbrella forecast P(umb|take) = 0.8 P(umb|~take)=0.1 happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25
VOI • VOI(forecast)= P(rainy)EU(rainy) +P(~rainy)EU(~rainy) – EU()
#3: EU(take|~rainy) #1: EU(take|rainy) #4: EU(~take|~rainy) #2: EU(~take|rainy)
Umbrella Network take/don’t take Take Umbrella rain umbrella forecast P(umb|take) = 0.8 P(umb|~take)=0.1 happiness P(F=rainy) = 0.4 U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25
#3: EU(take|~rainy) #1: EU(take|rainy) #4: EU(~take|~rainy) #2: EU(~take|rainy)
VOI • VOI(forecast)= P(rainy)EU(rainy) +P(~rainy)EU(~rainy) – EU()
Summary: Simple Decision Making • Decision Theory = Probability Theory + Utility Theory • Rational Agent operates by MEU • Decision Networks • Value of Information