1 / 17

Lecture 19

Lecture 19. Parameters and statistics. Example: A random sample of 1014 voters are asked if they think the President is too liberal, too conservative, or about right. 514 (51%) say ‘too liberal’. The observed percentage in the sample , 51%, is a statistic .

tyler
Télécharger la présentation

Lecture 19

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 19

  2. Parameters and statistics • Example: A random sample of 1014 voters are asked if they think the President is too liberal, too conservative, or about right. • 514 (51%) say ‘too liberal’. • The observed percentage in the sample, 51%, is a statistic. • The unknown percent of the population who would say yes is a parameter. • Presumably it’s close to 51%, but we will never know. • Question we’d like to ask: “How likely is the statistic to be how close to the parameter?”

  3. Sampling experiment • http://demonstrations.wolfram.com/UrnProblem/

  4. Experiment • Population (urn) – our class (22 people) • Question: Dogs vs Cats • We will sample 5 • Assign numbers • Use R to sample • Repeat 5 times • ascertain truth and run R

  5. US Polls • 2012 estimate of the number of eligible voters is 206,072,000. • We sampled 1014 people at random and got 514 yes.

  6. Main idea • We learned that given a model we can check if the data is consistent with it • Idea: Find models that are consistent with the data.

  7. US Polls • 2012 estimate of the number of eligible voters is 206,072,000. • We sampled 1014 people at random and got 514 yes. • We will consider various models: • True proportion is p = 0.05, 0.10, 0.15, … • Which of the models is the data in agreement with?

  8. Results

  9. “P-value” • Consider proportion of fake samples < 514 • Values close to 0 or 1 are not consistent with the model

  10. Cutoff • Using better resolution of models • A usual cutoff 0.05 split between both sides (0.975 and .025) • Models selected: [.477, .538]

  11. Fake data • 3 sub-groups • 42,343,562 (R) • 59,280,986 (D) • 104,447,452 (I) • We sample roughly in proportion: • Sampled 220, 201 yes (R) • Sampled 284, 31 yes (D) • Sampled 510, 282 yes (I)

  12. Issue • Too many levers to “fiddle” (three different Ks for each group) • Cannot simply look how well the data fits. • Needs more sophisticated statistics

  13. Naïve parametric bootstrap • Compute a number/numbers estimated from the data (point estimate). • Use this number to simulate a lot of fake data and see how the fake data can vary. • Use this variability to estimate the uncertainty in our estimator

  14. Original problem • Estimated K=206,072,000 * (514/1014)=104,458,588 • Estimated p in (.476,.537)

  15. Stratified problem • Estimate K for each group – combine to get joint p estimate • pcombined=.20*.91 +.29*.11 +.51*.55=.50 • This is not that much different from ignoring stratification • There is a (small) gain in uncertainty • Bootstrap interval (.474,.525)

  16. Bootstrap Stratified

  17. Bootstrap SRS

More Related