1 / 29

Collecting Data

Collecting Data. Understanding Random Sampling. Objectives:. To develop the basic properties of collecting an unbiased sample. To learn to recognize flaws in biased sampling. Intro…. Do you know what it means when something occurs randomly ? Randomly select a number

sharpb
Télécharger la présentation

Collecting Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collecting Data Understanding Random Sampling

  2. Objectives: • To develop the basic properties of collecting an unbiased sample. • To learn to recognize flaws in biased sampling.

  3. Intro… Do you know what it means when something occurs randomly? Randomly select a number from the next slide. Ready…

  4. 1 2 3 4

  5. Question: What would you except to happen if when we collected data on this simple task?

  6. How do we gather data? • Surveys • Opinion polls • Interviews • Studies • Observational • Retrospective (past) • Prospective (future) • Experiments

  7. Population Population – the entire group of individuals we want information about. Census – a complete count of the entire population

  8. Not accurate Very expensive Perhaps impossible If using destructive sampling, you would destroy population Breaking strength of soda bottles Lifetime of flashlight batteries Safety ratings for cars Why would we not use a census all the time?

  9. A part of the population that we examine in order to gather information Used to generalize information about a population Sample

  10. refers to the methodused to choose the sample from the population Sampling design Sampling frame • a list of every individual in the population

  11. consist of n individuals from the population chosen in such a way that every individual has an equal chance of being selected every set of n individuals has an equal chance of being selected Simple Random Sample (SRS)

  12. Advantages Unbiased Easy Disadvantages Large variance May not be representative Must have sampling frame (list of population) SRS

  13. select sample by following a systematic approach randomly select where to begin Systematic random sample

  14. Advantages Unbiased Ensure that the sample is distributed across population More efficient, cheaper, etc. Disadvantages Large variance Can be confounded by trend or cycle Formulas are complicated Systematic Random Sample

  15. Identify the sampling design A local restaurant manager wants to survey customers about the service they receive. Each night the manager randomly chooses a number between 1 & 10. He then gives a survey to that customer, and to every 10th customer after them, to fill it out before they leave. Systematic random sampling

  16. ERROR favors certain outcomes Note: We cannot ever draw conclusions from bias data. Throw it out and start over! Bias

  17. People chose to respond Usually only people with very strong opinions respond Produces biased results Voluntary response

  18. Ask people who are easy to ask Produces bias results Convenience sampling

  19. Source of bias? Suppose that you want to estimate the total amount of money spent by students on textbooks each semester at Rice. You collect register receipts for students as they leave the bookstore during lunch one day. Convenience sampling – easy way to collect data

  20. 1970 Draft Lottery and the Role of Randomization In that first draft lottery (conducted on December 1, 1969), a large, deep, cylindrical bowl was filled with 366 dates, one for each day of the year (including February 29, of course). The dates were placed inside small capsules (balls about the size of a pecan), added to the bowl, and then mixed. After mixing, the capsules were selected, one by one, and assigned a draft priority. Draft registrants whose birthdays matched the first 100 or so dates selected were likely to be called for induction. However, the bowl's small diameter and height (nearly arm's length) made the mixing less than random because each month's dates had been added sequentially in the yearly order of months. January's capsules were dumped in first, followed by February's and so on until December. Set of Data for 1970 Draft Lottery

  21. 1970 Draft Lottery

  22. 1970 Draft Number by Day of Year

  23. Mean Draft Number by Month

  24. How did the nonrandomness of the draft effect the casualties (deaths) during the Vietnam war? This was recently studied by Paul Sommers in "The Writing on the Wall", Chance, Vol, 1, 2003, p35-38. He examined the names of the casualties on the Vietnam Memorial (available online at thewall-usa.com) together with other sources and found the number of casualties by birth month:

  25. Selecting a SRS • For the AP exam: “Knowledgeable users of statistics need to be able to perform your sample exactly using the described method.” • Methods: we can “pick samples from a hat”, use a random number generator, or use a table of random digits to derive our sample

  26. SRS by picking out of a hat • Say items in hat are “mixed thoroughly” and state whether or not slips of paper are replaced back in the hat (yes if stratified sampling).

  27. each entry is equally likely to be any of the 10 digits digits are independent of each other Random digit table

  28. Suppose your population consisted of these 20 people: 1) Aidan 6) Fred 11) Kathy 16) Paul 2) Bob 7) Gloria 12) Lori 17) Shawnie 3) Chico 8) Hannah 13) Matthew 18) Tracy 4) Doug 9) Israel 14) Nancy 19) Uncle Sam 5) Edward 10) Jung 15) Opus 20) Vernon Use the following random digits to select a sample of five from these people. 1) Aidan We will need to use double digit random numbers, ignoring any number greater than 20. Start with Row 1 and read across. 13) Matthew 18) Tracy 5) Edward 15) Opus Ignore. Ignore. Ignore. Ignore. Row 1 4 5 1 8 0 5 1 3 7 1 20 1 5 5 8 0 1 5 7 0 38 9 9 3 4 3 5 0 6 3 Stop when five people are selected. So my sample would consist of : Aidan, Edward, Matthew, Opus, and Tracy

More Related