1 / 89

CHAPTER 6.1 SUMMARIZING POSSIBLE OUTCOMES AND THEIR PROBABILITIES

CHAPTER 6.1 SUMMARIZING POSSIBLE OUTCOMES AND THEIR PROBABILITIES. DEFINITION : A RANDOM VARIABLE IS A NUMERICAL MEASUREMENT OF THE OUTCOME OF A RANDOM PHENOMENON (EXPERIMENT).

Télécharger la présentation

CHAPTER 6.1 SUMMARIZING POSSIBLE OUTCOMES AND THEIR PROBABILITIES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHAPTER 6.1SUMMARIZING POSSIBLE OUTCOMES AND THEIR PROBABILITIES • DEFINITION: A RANDOM VARIABLE IS A NUMERICAL MEASUREMENT OF THE OUTCOME OF A RANDOM PHENOMENON (EXPERIMENT). • DEFINITION: A DISCRETE RANDOM VARIABLE X TAKES ITS VALUES FROM A COUNTABLE SET, FOR EXAMPLE, N = {0, 1, 2, 3, 4, 5, 6, 7, . . . }. • DEFINITION: THE PROBABILITY DISTRIBUTION OF A DISCRETE RANDOM VARIABLE IS A FUNCTION SUCH THAT FOR ALL OUTCOMES

  2. MEAN OF A DISCRETE PROBABILITY DISTRIBUTION • THE MEAN OF A PROBABILITY DISTRIBUTION FOR A DISCRETE RANDOM VARIABLE IS GIVEN BY IN WORDS, TO GET THE MEAN OF A DISCRETE PROBABILITY DISTRIBUTION, MULTIPLY EACH POSSIBLE VALUE OF THE RANDOM VARIABLE BY ITS PROBABILITY, AND THEN ADD ALL THESE PRODUCTS.

  3. EXAMPLE: NUMBER OF HOME RUNS IN A GAME FOR BOSTON RED SOX

  4. (1) WHAT IS THE EXPECTED (MEAN) NUMBER OF HOME RUNS FOR A BOSTON RED SOX BASEBALL GAME? (2) INTERPRET WHAT THIS MEAN (EXPECTED VALUE) MEANS.

  5. PROBABILITY FOR CONTINUOUS RANDOM VARIABLE • DEFINITION: A CONTINUOUS RANDOM VARIABLE HAS POSSIBLE VALUES THAT FORM AN INTERVAL, THAT IS, TAKES ITS VALUES FROM AN INTERVAL, FOR EXAMPLE, (2 , 5). • DEFINITION: THE PROBABILITY DISTRIBUTION OF A CONTINUOUS RANDOM VARIABLE IS SPECIFIED BY A CURVE THAT DETERMINES THE PROBABILITY THAT THE RANDOM VARIABLE FALLS IN ANY PARTICULAR INTERVAL OF VALUES.

  6. REMARKS • EACH INTERVAL HAS PROBABILITY BETWEEN 0 AND 1. THIS IS THE AREA UNDER THE CURVE, ABOVE THAT INTERVAL. • THE INTERVAL CONTAINING ALL POSSIBLE VALUES HAS PROBABILITY EQUAL TO 1, SO THE TOTAL AREA UNDER THE CURVE EQUALS 1. • ILLUSTRATIVE PICTURES

  7. CHAPTER 6.2FINDING PROBABILITIES FOR BELL – SHAPED DISTRIBUTIONS – THE NORMAL DISTRIBUTION • THE NORMAL DISTRIBUTION IS VERY COMMONLY USED FOR CONTINUOUS RANDOM VARIABLES. IT IS CHARACTERIZED BY A PARTICULAR SYMMETRIC, BELL – SHAPED CURVE WITH TWO PARAMETERS – THE MEAN AND STANDARD DEVIATION. • NOTATION • ILLUSTRATIVE PICTURES

  8. THE NORMAL DISTRIBUTION IS ALSO THE MODEL FOR A POPULATION DISTRIBUTION • THE POPULATION DISTRIBUTION OF A RANDOM VARIABLE X IS OFTEN MODELED BY A BELL – SHAPED CURVE WITH THE PROPERTIES THAT THE PROPORTION OF THE POPULATION FOR WHICH X IS BETWEEN a AND b, IS THE AREA UNDER THE CURVE, AND BETWEEN a AND b. • ILLUSTRATIVE PICTURE

  9. THE EMPIRICAL OR 68 – 95 – 99.7 % RULE • THE EMPIRICAL RULE STATES THAT FOR AN APPROXIMATELY BELL – SHAPED DISTRIBUTION, ABOUT 68% OF OBSERVATIONS(VALUES) FALL WITHIN ONE STANDARD DEVIATION OF THE MEAN;95% OF THE VALUES FALL WITHIN TWO STANDARD DEVIATIONS OF THE MEAN;99.7% OF VALUES FALL WITHIN THREE STANDARD DEVIATIONS OF THE MEAN. • ILLUSTRATIVE PICTURE

  10. FINDING PROBABILITIES FOR CONTINUOUS RANDOM VARIABLES USING THE STANDARD NORMALDISTRIBUTION TABLE • DEFINITION: THE STANDARD NORMAL DISTRIBUTION IS THE NORMAL DISTRIBUTION WITH MEAN = 0 AND STANDARD DEVIATION = 1. IT IS THE DISTRIBUTION OF NORMAL Z – SCORES. • DEFINITION: THE Z – SCORE FOR A VALUE x OF A RANDOM VARIABLE IS THE NUMBER OF STANDARD DEVIATIONS THAT x FALLS FROM THE MEAN. IT IS CALCULATED AS

  11. CLASS EXAMPLE 1 • IN A STANDARD NORMAL MODEL, WHAT PERCENT OF POPULATION IS IN EACH REGION? DRAW A PICTURE IN EACH CASE. • Z < 0.83 (B) Z > 0.83 (C) 0.1 < Z < 0.9 SOLUTION

  12. CLASS EXAMPLE 2 • IN A STANDARD NORMAL MODEL, FIND THE VALUE OF Z THAT CUTS OFF • (A) THE LOWEST 75% OF POPULATION; • (B) THE HIGHEST 20% OF POPULATION (= THE LOWEST 80%) • SOLUTION

  13. CLASS EXAMPLE 3 • SUPPOSE THAT WE MODEL SAT SCORES Y, BY N(500, 100) DISTRIBUTION. • (A) WHAT PERCENTAGE OF SAT SCORES FALL BETWEEN 450 AND 600? • (B) FOR WHAT SAT VALUE b, 10% OF SAT SCORES ARE GREATER THAN b? • SOLUTION

  14. CHAPTER 6.3PROBABILITY MODELS FOR OBSERVATIONS WITH TWO POSSIBLE OUTCOMES BERNOULLI TRIAL A RANDOM EXPERIMENT WITH TWO COMPLEMENTARY EVENTS, SUCCESS (S) AND FAILURE (F) IS CALLED A BERNOULLI TRIAL. P(SUCCESS) = p P(FAILURE) = q = 1 - p

  15. EXAMPLES • TOSSING A COIN 20 TIMES SUCCESS = HEADS WITH p = 0.5 AND FAILURE = TAILS WITH q = 1 – p = 0.5 • TAKING A MULTIPLE CHOICE EXAM UNPREPARED. SUCCESS = CORRECT ANSWER FAILURE = WRONG ANSWER p = 0.2; q = 1 – p = 1 – 0.2 = 0.8

  16. PRODUCTS COMING OUT OF A PRODUCTION LINE SUCCESS = DEFECTIVE ITEMS FAILURE = NON-DEFECTIVE ITEMS • ROLLING A DIE 10 TIMES SUCCESS = GETTING A 6; p = 1/6 FAILURE = NOT GETTING A 6; q = 5/6

  17. AN OFFER FROM A BANK FOR A CREDIT CARD WITH HIGH INTEREST RATE SUCCESS = DECLINE; FAILURE = ACCEPT • HAVING HEALTH INSURANCE SUCCESS = HAVE; FAILLURE = NOT HAVE • A REFERENDUM WHETHER TO RECALL AN UNFAITHFUL GOVERNOR FROM OFFICE SUCCESS = VOTE YES; FAILLURE = VOTE NO

  18. GEOMETRIC PROBABILITY MODEL • QUESTION: HOW LONG WILL IT TAKE TO ACHIEVE THE FIRST SUCCESS IN A SERIES OF BERNOULLI TRIALS? • THE MODEL THAT TELLS US THIS PROBABILITY (THAT IS, THE PROBABILITYUNTIL FIRST SUCCESS) IS CALLED THE GEOMETRIC PROBABILITY MODEL.

  19. CONDITIONS • THE FOLLOWING CONDITIONS MUST HOLD BEFORE USING THE GEOMETRIC PROBABILITY MODEL. (1) THE TRIALS MUST BE BERNOULLI, THAT IS, THE RANDOM EXPERIMENT MUST HAVE TWO COMPLEMENTARY OUTCOMES – SUCCESS AND FAILURE; (2) THE TRIALS MUST BE INDEPENDENT OF ONE ANOTHER; (3) THE PROBABILITY OF SUCCESS IS THE SAME FOR EACH TRIAL.

  20. GEOMETRIC PROBABILITY MODEL FOR BERNOULLI TRIALS • LET p = PROBABILAITY OF SUCCESS AND q = 1 – p = PROBABILITY OF FAILURE X = NUMBER OF TRIALS UNTIL FIRST SUCCESS OCCURS

  21. EXAMPLE • ASSUME THAT 13% OF PEOPLE ARE LEFT-HANDED. IF WE SELECT 5 PEOPLE AT RANDOM, FIND THE PROBABILITY OF EACH OUTCOME DESCRIBED BELOW. • (1) THE FIRST LEFTY IS THE FIFTH PERSON CHOSEN? 0.0745 • (2) THE FIRST LEFTY IS THE SECOND OR THIRD PERSON. 0.211 • (3) IF WE KEEP PICKING PEOPLE UNTIL WE FIND A LEFTY, HOW LONG WILL YOU EXPECT IT WILL TAKE? 7.69 PEOPLE

  22. EXAMPLE • AN OLYMPIC ARCHER IS ABLE TO HIT THE BULL’S-EYE 80% OF THE TIME. ASSUME EACH SHOT IS INDEPENDENT OF THE OTHERS. IF SHE SHOOTS 6 ARROWS, WHAT’S THE PROBABILITY THAT • (1) HER FIRST BULL’S-EYE COMES ON THE THIRD ARROW? ANS = 0.032 • (2) HER FIRST BULL’S-EYE COMES ON THE FOURTH OR FIFTH ARROW? ANS = 0.00768 • IF SHE KEEPS SHOOTING ARROWS UNTIL SHE HITS THE BULL’S-EYE, HOW LONG DO YOU EXPECT IT WILL TAKE? ANS = 1.25 SHOTS

  23. BINOMIAL PROBABILITY MODEL FOR BERNOULLI TRIALS • QUESTION: WHAT IS THE NUMBER OFSUCCESSES IN A SPECIFIED NUMBER OF TRIALS? • THE BINOMIAL PROBABILITY MODEL ANSWERS THIS QUESTION, THAT IS, THE PROBABILITY OF EXACTLY k SUCCESSES IN n TRIALS. • CONDITIONS: SAME AS THOSE FOR THE GEOMETRIC PROBABILITY MODEL

  24. BINOMIAL PROBABILITY MODEL • LET n = NUMBER OF TRIALS p = PROBABILITY OF SUCCESS q = PROBABILITY OF FAILURE X = NUMBER OF SUCCESSESS IN n TRIALS

  25. n! = n(n-1)(n-2)(n-3) … 3.2.1

  26. EXAMPLES • COMPUTE (1) 3! (2) 4! (3) 5! (4) 6! • COMPUTE

  27. EXAMPLE • ASSUME THAT 13% OF PEOPLE ARE LEFT-HANDED. IF WE SELECT 5 PEOPLE AT RANDOM, FIND THE PROBABILITY OF EACH OUTCOME BELOW. • (1) THERE ARE EXACTLY 3 LEFTIES IN THE GROUP. • 0.0166 • (2) THERE ARE AT LEAST 3 LEFTIES IN THE GROUP. • 0.0179 • (3) THERE ARE NO MORE THAN 3 LEFTIES IN THE GROUP. 0.9987

  28. EXAMPLE • AN OLYMPIC ARCHER IS ABLE TO HIT THE BULL’S-EYE 80% OF THE TIME. ASSUME EACH SHOT IS INDEPENDENT OF THE OTHERS. IF SHE SHOOTS 6 ARROWS, WHAT’S THE PROBABILITY THAT • (1) SHE GETS EXACTLY 4 BULL’S-EYES? 0.246 • (2) SHE GETS AT LEAST 4 BULL’S-EYES? 0.901 • (3) SHE GETS AT MOST 4 BULL’S-EYES? 0.345 • (4) SHE MISSES THE BULL’S-EYE AT LEAST ONCE? • 0.738 • (5) HOW MANY BULL’S-EYES DO YOU EXPECT HER TO GET? 4.8 BULL’SEYES • (6) WITH WHAT STANDARD DEVIATION? 0.98

  29. THE NORMAL MODEL TO THE RESCUE OF BINOMIAL MODEL • IF n, THE FIXED NUMBER OF TRIALS IS LARGE, THAT IS, THEN, THE BINOMIAL CUMULATIVE PROBABILITIES CAN BE APPROXIMATED BY THE NORMAL PROBABILITIES WITH THE SAME MEAN OR EXPECTED VALUE = n*p AND, THE SAME STANDARD DEVIATION = = SQRT( n*p*q)

  30. EXAMPLE • TENNESSEE RED CROSS COLLECTED BLOOD FROM 32,000DONORS. WHAT IS THE PROBABILITY THAT THEY HAD ATLEAST 1850 DONORS OF THE O-NEGATIVE BLOOD GROUP. THE PROBABILITY OF SOMEONE HAVING A 0-NEGATIVE BLOOD TYPE IS 0.06. • SOLUTION: LET X BE SOMEONE OF THE O-NEGATIVE BLOOD GROUP. THEN THE QUESTION CAN BE FORMULATED MATHEMATICALLY AS

  31. CHAPTER 6.4HOW LIKELY ARE THE POSSIBLE VALUES OF A STATISTICS? • REMINDER: A STATISTIC IS A NUMERICAL SUMMARY OF A SAMPLE DATA. SOME EXAMPLES ARE: SAMPLE PROPORTION, SAMPLE MEAN. • DEFINITION: THE SAMPLING DISTRIBUTION OF A STATISTIC IS THE PROBABILITY DISTRIBUTION THAT SPECIFIES PROBABILITIES FOR THE POSSIBLE VALUES THE STATISTIC CAN TAKE.

  32. SAMPLING DISTRIBUTION MODELS FOR PROPORTIONS AND MEANS • SAMPLING DISTRIBUTION MODEL FOR A PROPORTION PROBLEM FORMULATION: SUPPOSE THAT p IS AN UNKNOWN PROPORTION OF ELEMENTS OF A CERTAIN TYPE S IN A POPULATION. EXAMPLES • PROPORTION OF LEFT - HANDED PEOPLE; • PROPORTION OF HIGH SCHOOL STUDENTS WHO ARE FAILING A READING TEST; • PROPORTION OF VOTERS WHO WILL VOTE FOR MR. X.

  33. ESTIMATION OF p • TO ESTIMATE p, WE SELECT A SIMPLE RANDOM SAMPLE (SRS), OF SIZE SAY, n = 1000, AND COMPUTE THE SAMPLE PROPORTION. • SUPPOSE THE NUMBER OF THE TYPE WE ARE INTERESTED IN, IN THIS SAMPLE OF n = 1000 IS x= 437. THEN THE SAMPLE PROPORTION IS COMPUTED USING THE FORMULA

  34. IN THE EXAMPLE ABOVE

  35. WHAT IS THE ERROR OF ESTIMATION? • THAT IS, WHAT IS • WHAT MODEL CAN HELP US FIND THE BEST ESTIMATE OF THE TRUE PROPORTION OF p? • LET’S START THE ANALYSIS BY FIRST ANSWERING THE SECOND QUESTION.

  36. APPROACH • SUPPOSE THAT WE TAKE A SECOND SAMPLE OF SIZE 1000 AND COMPUTE P(HAT); CLEARLY, THE NEW ESTIMATE WILL BE DIFFERENT FROM 0.437. NOW, TAKE A THIRD SAMPLE, A FOURTH SAMPLE, UNTIL THE TWO THOUSANDTH (2000 –TH) SAMPLE, EACH OF SIZE 1000. IT IS OBVIOUS THAT WE WILL LIKELY OBTAIN TWO THOUSAND DIFFERENT P(HATS) AS ILLUSTRATED IN THE TABLE BELOW.

  37. TABLE OF 2000 SAMPLES OF SIZE EACH n=1000, AND THEIR CORRESPONDING P(HATS)

  38. WHAT DO WE DO WITH THE DATA FOR P(HATS)? • WE CONSTRUCT A HISTOGRAM OF THESE 2000 P(HATS). # OF SAMPLES p P(HATS)

  39. WHAT WE OBSERVE FROM THE HISTOGRAM • THE HISTOGRAM ABOVE IS AN EXAMPLE OF WHAT WE WOULD GET IF WE COULD SEE ALL THE PROPORTIONS FROM ALL POSSIBLE SAMPLES. THAT DISTRIBUTION HAS A SPECIAL NAME. IT IS CALLED THE SAMPLING DISTRIBUTION OF THE PROPORTIONS. • OBSERVE THAT THE HISTOGRAM IS UNIMODAL, ROUGHLY SYMMETRIC, AND IT’S CENTERED AT P WHICH ISTHE TRUE PROPORTION

  40. WHAT DOES THE SHAPE OF THE HISTOGRAM REMIND US ABOUT A MODEL THAT MAY JUST BE THE RIGHT ONE FOR SAMPLE PROPORTIONS? • ANSWER: IT IS AMAZING AND FORTUNATE THAT A NORMAL MODEL IS JUST THE RIGHT ONE FOR THE HISTOGRAMS OF SAMPLE PROPORTIONS. • HOW GOOD IS THE NORMAL MODEL? • IT IS GOOD IF THE FOLLOWING ASSUMPTIONS AND CONDITIONS HOLD.

  41. ASSUMPTIONS AND CONDITIONS • ASSUMPTIONS • INDEPENDENCE ASSUMPTION: THE SAMPLED VALUES MUST BE INDEPENDENT OF EACH OTHER. • SAMPLE SIZE ASSUMPTION: THE SAMPLE SIZE, n, MUST BE LARGE ENOUGH • REMARK: ASSUMPTIONS ARE HARD – OFTEN IMPOSSIBLE TO CHECK. THAT’S WHY WE ASSUME THEM. GLADLY, SOME CONDITIONS MAY PROVIDE INFORMATION ABOUT THE ASSUMPTIONS.

  42. CONDITIONS • RANDOMIZATION CONDITION: THE DATA VALUES MUST BE SAMPLED RANDOMLY. IF POSSIBLE, USE SIMPLE RANDOM SAMPLING DESIGN TO SAMPLE THE POPULATION OF INTEREST. • 10% CONDITION: THE SAMPLE SIZE, n, MUST BE NO LARGER THAN 10% OF THE POPULATION OF INTEREST. • SUCCESS/FAILURE CONDITION: THE SAMPLE SIZE HAS TO BE BIG ENOUGH SO THAT WE EXPECT AT LEAST 10 SUCCESSES AND AT LEAST 10 FAILLURES. THAT IS,

  43. THE CENTRAL LIMIT THEOREM FOR THE SAMPLING DISTRIBUTION OF A PROPORTION • FOR A LARGE SAMPLE SIZE n, THE SAMPLING DISTRIBUTION OF P(HAT) IS APPROXIMATELY THAT IS, P(HAT) IS NORMAL WITH

  44. EXAMPLE 1 • ASSUME THAT 30% OF STUDENTS AT A UNIVERSITY WEAR CONTACT LENSES • (A) WE RANDOMLY PICK 100 STUDENTS. LET P(HAT) REPRESENT THE PROPORTION OF STUDENTS IN THIS SAMPLE WHO WEAR CONTACTS. WHAT’S THE APPROPRIATE MODEL FOR THE DISTRIBUTION OF P(HAT)? SPECIFY THE NAME OF THE DISTRIBUTION, THE MEAN, AND THE STANDARD DEVIATION. BE SURE TO VERIFY THAT THE CONDITIONS ARE MET. • (B) WHAT’S THE APPROXIMATE PROBABILITY THAT MORE THAN ONE THIRD OF THIS SAMPLE WEAR CONTACTS?

  45. SOLUTION TO EXAMPLE 1

  46. EXAMPLE 2 • INFORMATION ON A PACKET OF SEEDS CLAIMS THAT THE GERMINATION RATE IS 92%. WHAT’S THE PROBABILITY THAT MORE THAN 95% OF THE 160 SEEDS IN THE PACKET WILL GERMINATE? BE SURE TO DISCUSS YOUR ASSUMPTIONS AND CHECK THE CONDITIONS THAT SUPPORT YOUR MODEL. • SOLUTION

  47. CHAPTER 6.5 – 6.6SAMPLING DISTRIBUTION OF THE SAMPLE MEANAPPROACH FOR ESTIMATINGSAME AS FOR SAMPLING DISTRIBUTION FOR PROPORTIONSILLUSTRATED ABOVE

  48. ASSUMPTIONS AND CONDITIONS • ASSUMPTIONS • INDEPENDENCE ASSUMPTION: THE SAMPLED VALUES MUST BE INDEPENDENT OF EACH OTHER • SAMPLE SIZE ASSUMPTION: THE SAMPLE SIZE MUST BE SUFFICIENTLY LARGE. • REMARK: WE CANNOT CHECK THESE DIRECTLY, BUT WE CAN THINK ABOUT WHETHER THE INDEPENDENCE ASSUMPTION IS PLAUSIBLE.

  49. CONDITIONS • RANDOMIZATION CONDITION: THE DATA VALUES MUST BE SAMPLED RANDOMLY, OR THE CONCEPT OF A SAMPLING DISTRIBUTION MAKES NO SENSE. IF POSSIBLE, USE SIMPLE RANDOM SAMPLING DESIGN TO ABTAIN THE SAMPLE. • 10% CONDITION: WHEN THE SAMPLE IS DRAWN WITHOUT REPLACEMENT (AS IS USUALLY THE CASE), THE SAMPLE SIZE, n, SHOULD BE NO MORE THAN 10% OF THE POPULATION. • LARGE ENOUGH SAMPLE CONDITION: IF THE POPULATION IS UNIMODAL AND SYMMETRIC, EVEN A FAIRLY SMALL SAMPLE IS OKAY. IF THE POPULATION IS STRONGLY SKEWED, IT CAN TAKE A PRETTY LARGE SAMPLE TO ALLOW USE OF A NORMAL MODEL TO DESCRIBE THE DISTRIBUTION OF SAMPLE MEANS

  50. CENTRAL LIMIT THEOREM FOR THE SAMPLING DISTRIBUTION FOR MEANS • FOR A LARGE ENOUGH SAMPLE SIZE, n, THE SAMPLING DISTRIBUTION OF THE SAMPLE MEAN IS APPROXIMATELY • THAT IS, NORMAL WITH

More Related