1 / 102

A Not-So-Quick Overview of Probability

A Not-So-Quick Overview of Probability. William W. Cohen Machine Learning 10-605. Warmup : Zeno’s paradox. 0. 1. 1+0.1+0.01+0.001+0.0001+… = ?. Lance Armstrong and the tortoise have a race Lance is 10x faster Tortoise has a 1m head start at time 0.

moanna
Télécharger la présentation

A Not-So-Quick Overview of Probability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Not-So-Quick Overview of Probability William W. Cohen Machine Learning 10-605

  2. Warmup: Zeno’s paradox 0 1 1+0.1+0.01+0.001+0.0001+… = ? • Lance Armstrong and the tortoise have a race • Lance is 10x faster • Tortoise has a 1m head start at time 0 • So, when Lance gets to 1m the tortoise is at 1.1m • So, when Lance gets to 1.1m the tortoise is at 1.11m … • So, when Lance gets to 1.11m the tortoise is at 1.111m … and Lance will never catch up -? unresolved until calculus was invented

  3. The prosecution calls Gottfried Leibniz.

  4. The Problem of Induction • David Hume (1711-1776): pointed out • Empirically, induction seems to work • Statement (1) is an application of induction. • This stumped people for about 200 years • Of the Different Species of Philosophy. • Of the Origin of Ideas • Of the Association of Ideas • Sceptical Doubts Concerning the Operations of the Understanding • Sceptical Solution of These Doubts • Of Probability9 • Of the Idea of Necessary Connexion • Of Liberty and Necessity • Of the Reason of Animals • Of Miracles • Of A Particular Providence and of A Future State • Of the Academical Or Sceptical Philosophy

  5. A Second Problem of Induction • A black crow seems to support the hypothesis “all crows are black”. • A pink highlighter supports the hypothesis “all non-black things are non-crows” • Thus, a pink highlighter supports the hypothesis “all crows are black”.

  6. Probability Theory • Events • discrete random variables, boolean random variables, compound events • Axioms of probability • What defines a reasonable theory of uncertainty • Compound events • Independent events • Conditional probabilities • Bayes rule and beliefs • Joint probability distribution

  7. Discrete Random Variables • A is a Boolean-valued random variable if • A denotes an event, • there is uncertainty as to whether A occurs. • Define P(A) as “the fraction of experiments in which A is true” • We’re assuming all possible outcomes are equiprobable a possible outcome of an “experiment” the experiment is not deterministic

  8. Visualizing A Event space of all possible worlds P(A) = Area of reddish oval Worlds in which A is true Its area is 1 Worlds in which A is False

  9. Discrete Random Variables • A is a Boolean-valued random variable if • A denotes an event, • there is uncertainty as to whether A occurs. • Define P(A) as “the fraction of experiments in which A is true” • We’re assuming all possible outcomes are equiprobable • Examples • You roll two 6-sided die (the experiment) and get doubles (A=doubles, the outcome) • I pick two students in the class (the experiment) and they have the same birthday (A=same birthday, the outcome) • A = I have Ebola • A = The US president in 2023 will be male • A = You wake up tomorrow with a headache • A = the 1,000,000,000,000th digit of π is 7 a possible outcome of an “experiment” the experiment is not deterministic

  10. The Axioms of Probability • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) Events, random variables, …., probabilities “Dice” “Experiments”

  11. (This is Andrew’s joke) The Axioms Of Probability

  12. These Axioms are Not to be Trifled With • There have been many many other approaches to understanding “uncertainty”: • Fuzzy Logic, three-valued logic, Dempster-Shafer, non-monotonic reasoning, … • 25 years ago people in AI argued about these; now they mostly don’t • Any scheme for combining uncertain information, uncertain “beliefs”, etc,… really should obey these axioms • If you gamble based on “uncertain beliefs”, then [you can be exploited by an opponent]  [your uncertainty formalism violates the axioms] - di Finetti 1931 (the “Dutch book argument”)

  13. Interpreting the axioms • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) The area of A can’t get any smaller than 0 And a zero area would mean no world could ever have A true

  14. Interpreting the axioms • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) The area of A can’t get any bigger than 1 And an area of 1 would mean all worlds will have A true

  15. A B Interpreting the axioms • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B)

  16. A B Interpreting the axioms • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) P(A or B) B P(A and B) Simple addition and subtraction

  17. Theorems from the Axioms • 0 <= P(A) <= 1, P(True) = 1, P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B)  P(not A) = P(~A) = 1-P(A) P(A or ~A) = P(A) + P(~A) - P(A and ~A) 1 = P(A) + P(~A) - 0 P(A or ~A) = 1 P(A and ~A) = 0

  18. Elementary Probability in Pictures • P(~A) + P(A) = 1 A ~A

  19. Side Note • I am inflicting these proofs on you for two reasons: • These kind of manipulations will need to be second nature to you if you use probabilistic analytics in depth • Suffering is good for you (This is also Andrew’s joke)

  20. Another important theorem • 0 <= P(A) <= 1, P(True) = 1, P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B)  P(A) = P(A ^ B) + P(A ^ ~B) A = A and (B or ~B) = (A and B) or (A and ~B) P(A) = P(A and B) + P(A and ~B) – P((A and B) and (A and ~B)) P(A) = P(A and B) + P(A and ~B) – P(A and A and B and ~B)

  21. Elementary Probability in Pictures • P(A) = P(A ^ B) + P(A ^ ~B) A ^ B B A ^ ~B ~B

  22. The LAWSOfProbability Laws of probability: Axioms … Monty Hall Problem proviso

  23. The Monty Hall Problem 3 • You’re in a game show. Behind one door is a prize. Behind the others, goats. • You pick one of three doors, say #1 • The host, Monty Hall, opens one door, revealing…a goat! • You now can either • stick with your guess • always change doors • flip a coin and pick a new door randomly according to the coin

  24. Case 1: you don’t swap. W = you win. Pre-goat: P(W)=1/3 Post-goat: P(W)=1/3 Case 2: you swap W1=you picked the cash initially. W2=you win. Pre-goat: P(W1)=1/3. Post-goat: W2 = ~W1 Pr(W2) = 1-P(W1)=2/3. The Monty Hall Problem Moral: ?

  25. The Extreme Monty Hall/Survivor Problem • You’re in a game show. There are 10,000 doors. Only one of them has a prize. • You pick a door. • Over the remaining 13 weeks, the host eliminates 9,998 of the remaining doors. • For the season finale: • Do you switch, or not? …

  26. Some practical problems • You’re the DM in a D&D game. • Joe brings his own d20 and throws 4 critical hits in a row to start off • DM=dungeon master • D20 = 20-sided die • “Critical hit” = 19 or 20 • Is Joe cheating? • What is P(A), A=four critical hits? • A is a compound event: A = C1 and C2 and C3 and C4

  27. Independent Events • Definition: two events A and B are independent if Pr(A and B)=Pr(A)*Pr(B). • Intuition: outcome of A has no effect on the outcome of B (and vice versa). • We need to assume the different rolls are independent to solve the problem. • You frequently need to assume the independence of something to solve any learning problem.

  28. Some practical problems • You’re the DM in a D&D game. • Joe brings his own d20 and throws 4 critical hits in a row to start off • DM=dungeon master • D20 = 20-sided die • “Critical hit” = 19 or 20 • What are the odds of that happening with a fair die? • Ci=critical hit on trial i, i=1,2,3,4 • P(C1 and C2 … and C4) = P(C1)*…*P(C4) = (1/10)^4 Followup: D=pick an ace or king out of deck three times in a row: D=D1 ^ D2 ^ D3

  29. Some practical problems • The specs for the loaded d20 say that it has 20 outcomes, X where • P(X=20) = 0.25 • P(X=19) = 0.25 • for i=1,…,18, P(X=i)= Z * 1/18 • What is Z?

  30. Multivalued Discrete Random Variables • Suppose A can take on more than 2 values • A is a random variable with arity k if it can take on exactly one value out of {v1,v2, .. vk} • Example: V={aaliyah, aardvark, …., zymurge, zynga} • Example: V={aaliyah_aardvark, …, zynga_zymgurgy} • Thus…

  31. Terms: Binomials and Multinomials • Suppose A can take on more than 2 values • A is a random variable with arity k if it can take on exactly one value out of {v1,v2, .. vk} • Example: V={aaliyah, aardvark, …., zymurge, zynga} • Example: V={aaliyah_aardvark, …, zynga_zymgurgy} • The distribution Pr(A) is a multinomial • For k=2 the distribution is a binomial

  32. More about Multivalued Random Variables • Using the axioms of probability… 0 <= P(A) <= 1, P(True) = 1, P(False) = 0 P(A or B) = P(A) + P(B) - P(A and B) • And assuming that A obeys… • It’s easy to prove that

  33. More about Multivalued Random Variables • Using the axioms of probability and assuming that A obeys… • It’s easy to prove that • And thus we can prove

  34. Elementary Probability in Pictures A=2 A=3 A=5 A=4 A=1

  35. Elementary Probability in Pictures A=aaliyah … A=… A=zynga A=…. A=aardvark

  36. Some practical problems • The specs for the loaded d20 say that it has 20 outcomes, X • P(X=20) = P(X=19) = 0.25 • for i=1,…,18, P(X=i)= z … and what is z?

  37. Some practical problems • You (probably) have 8 neighbors and 5 close neighbors. • What is Pr(A), A=one or more of your neighbors has the same sign as you? • What’s the experiment? • What is Pr(B), B=you and your close neighbors all have different signs? • What about neighbors? Moral: ?

  38. Some practical problems I bought a loaded d20 on EBay…but it didn’t come with any specs. How can I find out how it behaves? P(X=20) = P(X=19) = 0.25 for i=1,…,18, P(X=i)= 0.5 * 1/18

  39. Some practical problems • I have 3 standard d20 dice, 1 loaded die. • Experiment: (1) pick a d20 uniformly at random then (2) roll it. Let A=d20 picked is fair and B=roll 19 or 20 with that die. What is P(B)? P(B) = P(B and A) + P(B and ~A) = 0.1*0.75 + 0.5*0.25 = 0.2 • using Andrew’s “important theorem” P(A) = P(A ^ B) + P(A ^ ~B)

  40. Elementary Probability in Pictures Followup: What if I change the ratio of fair to loaded die in the experiment? • P(A) = P(A ^ B) + P(A ^ ~B) A ^ B B A ^ ~B ~B

  41. Some practical problems • I have lots of standard d20 die, lots of loaded die, all identical. • Experiment is the same: (1) pick a d20 uniformly at random then (2) roll it. Can I mix the dice together so that P(B)=0.137 ? P(B) = P(B and A) + P(B and ~A) = 0.1*λ + 0.5*(1- λ) = 0.137 “mixture model” λ = (0.5 - 0.137)/0.4 = 0.9075

  42. Another picture for this problem • It’s more convenient to say • “if you’ve picked a fair die then …” i.e. Pr(critical hit|fair die)=0.1 • “if you’ve picked the loaded die then….” Pr(critical hit|loaded die)=0.5 Conditional probability: Pr(B|A) = P(B^A)/P(A) A (fair die) ~A (loaded) P(B|A) P(B|~A) ~A and B A and B

  43. Definition of Conditional Probability P(A ^ B) P(A|B) = ----------- P(B) Corollary: The Chain Rule P(A ^ B) = P(A|B) P(B)

  44. Some practical problems “marginalizing out” A • I have 3 standard d20 dice, 1 loaded die. • Experiment: (1) pick a d20 uniformly at random then (2) roll it. Let A=d20 picked is fair and B=roll 19 or 20 with that die. What is P(B)? P(B) = P(B|A) P(A) + P(B|~A) P(~A) = 0.1*0.75 + 0.5*0.25 = 0.2

  45. P(~A) P(A) P(B) = P(B|A)P(A) + P(B|~A)P(~A) A (fair die) ~A (loaded) P(B|A) P(B|~A) ~A and B A and B

  46. Some practical problems • I have 3 standard d20 dice, 1 loaded die. • Experiment: (1) pick a d20 uniformly at random then (2) roll it. Let A=d20 picked is fair and B=roll 19 or 20 with that die. • Suppose B happens (e.g., I roll a 20). What is the chance the die I rolled is fair? i.e. what is P(A|B) ?

  47. P(A|B) = ? P(B|A) * P(A) P(A|B) = P(B) P(B) P(A and B) = P(A|B) * P(B) P(A and B) = P(B|A) * P(A) P(A|B) * P(B) = P(B|A) * P(A) P(~A) P(A) A (fair die) ~A (loaded) ~A and B A and B P(B|A) P(B|~A)

  48. P(B|A) * P(A) P(A|B) * P(B) P(A|B) = P(B|A) = P(A) P(B) posterior prior Bayes’ rule Bayes, Thomas (1763) An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means merely a curious speculation in the doctrine of chances, but necessary to be solved in order to a sure foundation for all our reasonings concerning past facts, and what is likely to be hereafter…. necessary to be considered by any that would give a clear account of the strength of analogical or inductive reasoning…

More Related