1 / 34

Decision Making Under Uncertainty

Decision Making Under Uncertainty. Russell and Norvig: ch 16 CMSC421 – Fall 2006. sensors. environment. ?. agent. actuators. Utility-Based Agent. ?. ?. a. a. b. b. c. c. {a(p a ),b(p b ),c(p c )} decision that maximizes expected utility value.

Télécharger la présentation

Decision Making Under Uncertainty

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006

  2. sensors environment ? agent actuators Utility-Based Agent

  3. ? ? a a b b c c • {a(pa),b(pb),c(pc)} • decision that maximizes expected utility value Non-deterministic vs. Probabilistic Uncertainty • {a,b,c} • decision that is best for worst case Non-deterministic model Probabilistic model ~ Adversarial search

  4. Expected Utility • Random variable X with n values x1,…,xn and distribution (p1,…,pn)E.g.: Xi is Resulti(A)|Do(A), E, the state reached after doing an action A given E, what we know about the current state • Function U of XE.g., U is the utility of a state • The expected utility of A is EU[A|E] = Si=1,…,n p(xi|A)U(xi) = Si=1,…,n p(Resulti(A)|Do(A),E)U(Resulti(A))

  5. s0 A1 s1 s2 s3 0.2 0.7 0.1 100 50 70 One State/One Action Example U(S0) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1 = 20 + 35 + 7 = 62

  6. s0 A1 A2 s1 s2 s3 s4 0.2 0.7 0.2 0.1 0.8 100 50 70 One State/Two Actions Example • U1(S0) = 62 • U2(S0) = 74 • U(S0) = max{U1(S0),U2(S0)} • = 74 80

  7. s0 A1 A2 s1 s2 s3 s4 0.2 0.7 0.2 0.1 0.8 100 50 70 Introducing Action Costs • U1(S0) = 62 – 5 = 57 • U2(S0) = 74 – 25 = 49 • U(S0) = max{U1(S0),U2(S0)} • = 57 -5 -25 80

  8. MEU Principle • rational agent should choose the action that maximizes agent’s expected utility • this is the basis of the field of decision theory • normative criterion for rational choice of action AI is Solved!!!

  9. Not quite… • Must have complete model of: • Actions • Utilities • States • Even if you have a complete model, will be computationally intractable • In fact, a truly rational agent takes into account the utility of reasoning as well---bounded rationality • Nevertheless, great progress has been made in this area recently, and we are able to solve much more complex decision theoretic problems than ever before

  10. We’ll look at • Decision Theoretic Reasoning • Simple decision making (ch. 16) • Sequential decision making (ch. 17)

  11. Preferences • An agent chooses among prizes (A, B, etc.) and lotteries, i.e., situations with uncertain prizes • Lottery L = [p, A; (1 – p), B] • Notation: A > B: A preferred to BA  B : indifference between A and BA ≥ B : B not preferred to A

  12. Rational Preferences • Idea: preferences of a rational agent must obey constraints • Axioms of Utility Theory • Orderability: (A > B) v (B > A) v (A  B) • Transitivity: (A > B) ^ (B > C) (A > C) • Contitnuity: A > B > C  p [p, A; 1-p,C]  B • Substitutability: A  B  [p, A; 1-p,C]  [p, B; 1-p,C] • Monotonicity: A > B  (p ≥ q  [p, A; 1-p, B] ≥ [q, A; 1-q, B])

  13. Rational Preferences • Violating the constraints leads to irrational behavior • E.g: an agent with intransitive preferences can be induced to give away all its money • if B > C, than an agent who has C would pay some amount, say $1, to get B • if A > B, then an agent who has B would pay, say, $1 to get A • if C > A, then an agent who has A would pay, say, $1 to get C • ….oh, oh!

  14. Rational Preferences  Utility • Theorem (Ramsey, 1931, von Neumann and Morgenstern, 1944): Given preferences satisfying the constraints, there exists a real-valued function U such that U(A) ≥ U(B)  A ≥B U([p1,S1;…,pn,Sn])=i piU(Si) • MEU principle: Choose the action that maximizes expected utility

  15. Utility Assessment • Standard approach to assessment of human utilites:compare a given state A to a standard lottery Lp that has best possible prize w/ prob. p worst possible catastrophy w/ prob. (1-p) • adjust lottery probability p until ALp continue as before p A  Lp instant death 1 - p

  16. Aside: Money  Utility function • Given a lottery L with expected monetrary value EMV(L), • usually U(L) < U(EMV(L)) • e.g., people are risk-averse • Would you rather have $1,000,000 for sure, or a lottery with [0.5, $0; 0.5, $3,000,000]?

  17. Decision Networks • Extend BNs to handle actions and utilities • Also called Influence diagrams • Make use of BN inference • Can do Value of Information calculations

  18. Decision Networks cont. • Chance nodes: random variables, as in BNs • Decision nodes: actions that decision maker can take • Utility/value nodes: the utility of the outcome state.

  19. R&N example

  20. Prenatal Testing Example

  21. Umbrella Network take/don’t take P(rain) = 0.4 Take Umbrella rain umbrella P(umb|take) = 1.0 P(~umb|~take)=1.0 happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25

  22. Evaluating Decision Networks • Set the evidence variables for current state • For each possible value of the decision node: • Set decision node to that value • Calculate the posterior probability of the parent nodes of the utility node, using BN inference • Calculate the resulting utility for action • return the action with the highest utility

  23. Umbrella Network take/don’t take P(rain) = 0.4 Take Umbrella rain umbrella P(umb|take) = 1.0 P(umb|~take)= 0 happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25

  24. Umbrella Network take/don’t take P(rain) = 0.4 Take Umbrella rain umbrella #1 P(umb|take) = 0.8 P(umb|~take)=0.1 happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25 #1: EU(take) = 100 x .12 + -100 x 0.08 + 0 x 0.48 + -25 x .32 = ???

  25. Umbrella Network So, in this case I would…? take/don’t take P(rain) = 0.4 Take Umbrella rain umbrella #2 P(umb|take) = 0.8 P(umb|~take)=0.1 happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25 #2: EU(~take) = 100 x .54 + -100 x 0.36 + 0 x 0.06 + -25 x .04 = ???

  26. Value of Information • Idea: Compute the expected value of acquiring possible evidence • Example: buying oil drilling rights • Two blocks A and B, exactly one of them has oil, worth k • Prior probability 0.5 • Current price of block is k/2 • What is the value of getting a survey of A done? • Survey will say ‘oil in A’ or ‘no oil in A’ w/ prob. 0.5 • Compute expected value of information (VOI) • expected value of best action given the infromation minus expected value of best action without information • VOI(Survey) = [0.5 x value of buy A given oil in A] + [0.5 x value of buy B given no oil in A] – 0 = ??

  27. the value of the new best action (after new evidence E’ is obtained): the value of information for E’ is: Value of Information (VOI) • suppose agent’s current knowledge is E. The value of the current best action  is

  28. Umbrella Network take/don’t take P(rain) = 0.4 Take Umbrella rain umbrella forecast P(umb|take) = 0.8 P(umb|~take)=0.1 happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25

  29. VOI • VOI(forecast)= P(rainy)EU(rainy) +P(~rainy)EU(~rainy) – EU()

  30. #3: EU(take|~rainy) #1: EU(take|rainy) #4: EU(~take|~rainy) #2: EU(~take|rainy)

  31. Umbrella Network take/don’t take Take Umbrella rain umbrella forecast P(umb|take) = 0.8 P(umb|~take)=0.1 happiness P(F=rainy) = 0.4 U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25

  32. #3: EU(take|~rainy) #1: EU(take|rainy) #4: EU(~take|~rainy) #2: EU(~take|rainy)

  33. VOI • VOI(forecast)= P(rainy)EU(rainy) +P(~rainy)EU(~rainy) – EU()

  34. Summary: Simple Decision Making • Decision Theory = Probability Theory + Utility Theory • Rational Agent operates by MEU • Decision Networks • Value of Information

More Related