Week 7

Week 7 Sample Means & Proportions

Variability of Summary Statistics • Variability in shape of distn of sample • Variability in summary statistics • Mean, median, st devn, upper quartile, … • Summary statistics have distributions

Parameters and statistics • Parameter describes underlying population • Constant • Greek letter (e.g. , , , …) • Unknown value in practice • Summary statistic • Random • Roman letter (e.g. m, s, p, …) • We hope statistic will tell us about corresponding parameter

Distn of sample vsSampling distn of statistic • Values in a single random sample have a distribution • Single sample --> single value for statistic • Sample-to-sample variability of statistic is its sampling distribution.

Means • Unknown population mean,  • Sample mean, X, has a distribution — its sampling distribution. • Usually x ≠  • A single sample mean, x, gives us information about 

Sampling distribution of mean If sample size, n, increases: • Spread of distn of sample is (approx) same. • Spread of sampling distn of mean gets smaller. • x is likely to be closer to  • x becomes a better estimate of 

Sampling distribution of mean Population with mean , st devn  Sample mean, X, has sampling distn with: • Mean, • St devn, Random sample (n independent values) (We will deal later with the problem that  and  are unknown in practice.)

Weight loss Estimate mean weight loss for those attending clinic for 10 weeks • Random sample of n = 25 people • Sample mean, x How accurate? Let’s see, if the population distn of weight loss is:

Some samples Four random samples of n = 25 people: • Mean = 8.32 pounds, st devn = 4.74 pounds • Mean = 8.32 pounds, st devn = 4.74 pounds • Mean = 8.48 pounds, st devn = 5.27 pounds • Mean = 7.16 pounds, st devn = 5.93 pounds N.B. In all samples, x ≠ 

Sampling distribution Means from simulation of 400 samples Theory: mean =  = 8 lb, s.d.( ) = lb (How does this compare to simulation? To popn distn?)

Errors in estimation Population • From 70-95-100 rule • x will be almost certainly within 8 ± 3 lb • x is unlikely to be more than 3 lb in error • Even if we didn’t know  • x is unlikely to be more than 3 lb in error Sampling distribution of mean mean =  = 8 lb, s.d.( ) = lb

Increasing sample size, n If we sample n = 100 people instead of 25: s.d.( ) = lb. Larger samples more accurate estimates

Central Limit Theorem • If population is normal (, ) • If popn is non-normal with (, ) but n is large Guideline: n > 30 even if very non-normal

Other summary statistics E.g. Lower quartile, proportion, correlation • Usually not normal distns • Formula for standard devn of samling distn sometimes • Sampling distn usually close to normal if n is large

Lottery problem Pennsylvania Cash 5 lottery • 5 numbers selected from 1-39 • Pick birthdays of family members (none 32-39) • P(highest selected is 32 or over)? Statistic: H = highest of 5 random numbers (without replacement)

Lottery simulation Theory? Fairly hard. Simulation: Generated 5 numbers (without replacement) 1560 times Highest number > 31 in about 72% of repetitions

Normal distributions • Family of distributions (populations) • Shape depends only on parameters (mean) &  (st devn) • All have same symmetric ‘bell shape’ • = 65 inches, s = 2.7 inches

Importance of normal distn • A reasonable model for many data sets • Transformed data often approx normal • Sample means (and many other statistics) are approx normal.

Standard normal distribution • Z ~ Normal ( = 0,  = 1) -3 -2 -1 0 1 2 3 • Prob ( Z < z* )

Probabilities for normal (0, 1) P(Z  -3.00) = P(Z  −2.59) = P(Z  1.31) = P(Z  2.00) = P(Z  -4.75) = 0.0013 0 .0048 0 .9049 0 .9772 0 .000001 Check from tables:

Probability Z > 1.31 P(Z > 1.31) = 1 – P(Z  1.31) = 1 – .9049 = .0951

Prob ( Z between –2.59 and 1.31) P(-2.59 Z  1.31) = P(Z  1.31) – P(Z  -2.59) = .9049 – .0048 = .9001

Standard devns from mean • Normal (, )     • Heightsof students • = 65 inches, s = 2.7 inches

Probability and area X ~ normal ( = 65 , s = 2.7 ) P (X ≤ 67.7) = area

Probability and area (cont.) Exactly70-95-100 rule • P(X within  of ) = 0.683 approx 70% • P(X within 2 of ) = 0.954 approx 95% • P(X within 3 of ) = 0.997 approx 100% • Normal (, )    

Finding approx probabilities Ht of college woman, X ~ normal ( = 65 , s = 2.7 ) Prob (X ≤ 62 )? Sketch normal density Estimate area P (X ≤ 62) = area About 1/8

Translate question from X to Z Translate to z-score: • Z ~ Normal ( = 0,  = 1) • X ~ Normal (, ) • Find P(X ≤ x*)     x* -3 -2 -1 0 1 2 3 z*

Finding probabilities Prob (height of randomly selected college woman ≤ 62 )? About 13%.

Prob (X > value) Ht of college woman, X ~ normal ( = 65 , s = 2.7 ) Prob (X > 68 inches)?

Finding upper quartile Blood Pressures are normal with mean 120 and standard deviation 10. What is the 75th percentile? Step 1: Solve for z-score Closest z* with area of 0.7500 (tables) z = 0.67 Step 2: Calculate x = z*s+ m x = (0.67)(10) + 120 = 126.7 or about 127.

Probabilities about means • Blood pressure ~ normal ( = 120,  = 10) • 8 people given drug • If drug does not affect blood pressure, • Find P(average blood pressure > 130)

P ( X > 130) ? • prob = 0.0023 • X ~ normal ( = 120,  = 10) • n = 8 Very little chance!

Distribution of sum X ~ distn with (, )  aX ~ distn with (a, a) e.g. milesto kilometers  Central Limit Theorem implies approx normal

Probabilities about sum • Profit in 1 day ~ normal (= $300, = $200) • Prob(total profit in week < $1,000)? • Total = • Prob = 0.0188 Assumes independence

Categorical data • Most important parameter is •  = Prob (success) • Corresponding summary statistic is • p = Proportion (success) ^ N.B. Textbook uses p and p

Number of successes • Easiest to deal with count of successes before proportion. • If… • 1. n “trials” (fixed beforehand). • 2. Only “success” or “failure” possible for each trial. • 3. Outcomes are independent. • Prob (success), remains same for all trials, . • Prob (failure) is 1 – . • X = number of successes ~ binomial (n, )

Examples

Binomial Probabilities for k = 0, 1, 2, …, n You won’t need to use this!! Prob (win game) = 0.2 Plays of game are independent. What is Prob (wins 2 out of 3 games)? What is P(X = 2)?

Mean & st devn of Binomial For a binomial (n, )

Extraterrestrial Life? 50% of large population would say “yes” if asked, “Do you believe there is extraterrestrial life?” Sample of n = 100 X = # “yes” ~ binomial (n = 100,  = 0.5)

Extraterrestrial Life? Sample of n = 100 X = # “yes” ~ binomial (n = 100,  = 0.5) 70-95-100 rule of thumb for # “yes” • About 95% chance of between 40 & 60 • Almost certainly between 35 & 65

Normal approx to binomial If X is binomial (n , ), and n is large, then X is also approximately normal, with Conditions: Both nand n(1 – ) are at least 10. (Justified by Central Limit Theorem)

Number of H in 30 Flips X = # heads in n = 30 flips of fair coinX ~ binomial ( n = 30, = 0.5) Bell-shaped & approx normal.

Opinion poll n = 500 adults; 240 agreed with statement If  = 0.5 of all adults agree, what P(X ≤ 240) ? X is approx normal with Not unlikely to see 48% or less, even if 50% in population agree.

Sample Proportion • Suppose (unknown to us) 40% of a population carry the gene for a disease, ( = 0.40). • Random sample of 25 people; X = # with gene. • X ~ binomial (n = 25 ,  = 0.4) p = proportion with gene

Distn of sample proportion • X ~ binomial (n , ) Large n: p is approx normal (n ≥ 10 &n (1 – ) ≥ 10)

Examples • Election Polls:to estimate proportion who favor a candidate; units = all voters. • Television Ratings:to estimate proportion of households watching TV program; units = all households with TV. • Consumer Preferences:to estimate proportion of consumers who prefer new recipe compared with old; units = all consumers. • Testing ESP:to estimate probability a person can successfully guess which of 5 symbols on a hidden card; repeatable situation = a guess.

Public opinion poll Suppose 40% of all voters favor Candidate A. Pollsters sample n = 2400 voters. Propn voting for A is approx normal Simulation 400 times & theory.

Probability from normal approx If 40% of voters favor Candidate A, and n = 2400 sampled Sample proportion, p, is almost certain to be between 0.37 and 0.43 Prob 0.95 of p being between 0.38 and 0.42

Week 7

Week 7

Presentation Transcript

Week 7

Week 7

Week 7

Week 7

Week 7

Week 7

Week 7

Week 7

Week 7

week 7

7 Week 7

Week 7

Week 7

WEEK 7

Week 7

Week 7

Week 7

Week 7:

Week 7