1 / 41

The Scientific Study of Politics (POL 51)

The Scientific Study of Politics (POL 51) . Professor B. Jones University of California, Davis. Today . Sampling Plans Survey Research. Populations. Key Concepts Population Defined by the research “All U.S. citizens age 18 or older.” All democratic countries

mabyn
Télécharger la présentation

The Scientific Study of Politics (POL 51)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Scientific Study of Politics (POL 51) Professor B. Jones University of California, Davis

  2. Today • Sampling Plans • Survey Research

  3. Populations • Key Concepts • Population • Defined by the research • “All U.S. citizens age 18 or older.” • All democratic countries • Counties in the United States • Characteristics of a Population • Bounded and definable • If you can’t define the population, you probably don’t have a well formed research question!

  4. Populations vs. Samples • Populations are often unattainable • TOO BIG (U.S. population) • Very Costly to Obtain • May not be necessary • The beauty of statistical theory • Samples • Simply Defined: a subset of the population chosen in some manner • How you choose is the important question!

  5. Moving Parts of a Sample • Units of Analysis • J is the population • i is a member of J • Then i is a “sample element” • Sampling Frames • The actual source of the data • Literary Digest Poll (1936) • “Dewey Defeats Truman” (1948) • Exit Polls

  6. More Moving Parts • Sampling Unit • Could be same as sample element (Unit of Analysis) • But it could be collections of elements (cluster, stratified sampling) • Sampling Plan • Random? Nonrandom?

  7. Kinds of Samples • Simple Random Sample • Major Characteristic: Every sample element has an equi-probable chance of selection. • If done properly, maximizes the likelihood of a representative sample. • What if your assumptions of randomness goes badly? • Nonrandom samples (often) produce nonrepresentative surveys.

  8. Why Randomness is Goodness • Nonprobability Sampling • Probability of “getting into” the sample is unknown • All bets are off; inference most likely impossible • Highly unreliable! • Simple Random Sampling • Every sample element has the same probability of being selected: Pr(selection)=1/N • In practice, not always easy to guarantee or achieve • An Example of a Bad Assumption

  9. Some Data

  10. More Data

  11. Getting Probability Samples Wrong

  12. Draft Lottery • Simple random sampling did not exist. • Avg. Lottery Number Jan.-June: 206 • Avg. Lottery Number July-Dec.: 161 • Avg. Deaths Jan.-June: 159 • Avg. Deaths July-Dec.: 111 • Differences highly significant. • Its absence had profound consequences. • Randomness should have ensured an equal chance of draft, invariant to birth date. It didn’t. • By analogy, suppose college admissions were based on this kind of lottery… • Those of you born later in the year would be less likely to be admitted. • Would you consider that fair?

  13. How to Achieve Randomness • Random number generation • Modern computers are really good at this. • Assign sample elements a number • Generate a random numbers table • Use a decision rule upon which to select sample. • The Key: sampled units are randomly drawn. • Why Important? Randomness helps ensure REPRESENTATIVENESS! • Absent this, all bets are off: • Convenience Polls • Push Polls • Person-on-the-Street Interviews

  14. A Population and Some “Samples” • A “Population” • Striations represent “attitudes” • Some “Samples”

  15. Other Kinds of Sampling Strategies • Stratified Samples: a probability sample in which elements sharing some characteristic are grouped and then sample elements are randomly chosen from each group. • Benefit? Can ensure more representative sample with smaller sample sizes. • Why might this be the case?

  16. Sampling come to life in…R!!!  • Suppose we have a population of 100,000 • And in that population, we have 4 groups • Group 1: 13,000 (13 percent) • Group 2: 12,000 (12 percent) • Group 3: 4,000 ( 4 percent) • Group 4: 70,000 (70 percent) • Racial/Ethnic Characteristics in the US: US Census • White (69.13 percent) • Black (12.06 percent) • Hispanic (12.55 percent) • Asian (3.6 percent) • Some R Code

  17. R #Creating a population of 100,000 consisting of 4 groups set.seed(535126235) population<- rep(1:4,c(13000, 12000, 4000, 70000)) #Tabulating the population (ctab requires package catspec) ctab(table(population)) #Tabulating the population (ctab requires package catspec) (btw, not sure why percents are not whole numbers) ctab(table(population)) Count Total % population 1 13000.00 13.13 2 12000.00 12.12 3 4000.00 4.04 4 70000.00 70.71

  18. Sampling • What do we expect from random sampling? • That each sample reproduces the population proportions. • Let’s consider SIMPLE RANDOM SAMPLES. • Also, let’s consider small samples (size 100) • …which is a .001 percent sample.

  19. R: 3 samples of n=100 #Three Simple Random Samples without Replacement; n=100 which is a .001 percent sample #The set.seed command ensures I can exactly replicate the simulations set.seed(15233) srs1<-sample(population, size=100, replace=FALSE) ctab(table(srs1)) set.seed(5255563) srs2<-sample(population, size=100, replace=FALSE) ctab(table(srs2)) set.seed(5255) srs3<-sample(population, size=100, replace=FALSE) ctab(table(srs3))

  20. R: Sample Results > set.seed(15233) > srs1<-sample(population, size=100, replace=FALSE) > ctab(table(srs1)) Count Total % srs1 1 19 19 2 13 13 3 5 5 4 63 63 > set.seed(5255563) > srs2<-sample(population, size=100, replace=FALSE) > ctab(table(srs2)) Count Total % srs2 1 16 16 2 8 8 3 4 4 4 72 72 > set.seed(5255) > srs3<-sample(population, size=100, replace=FALSE) > ctab(table(srs3)) Count Total % srs3 1 12 12 2 9 9 3 1 1 4 78 78

  21. Implications? • Small samples? • Variability in proportion of groups. • Why does this occur? • Let’s understand stratification. • What does it do? • You’re sampling within strata. • Suppose we know the population proportions?

  22. R: Identifying Strata and then Sampling from them. #Stratified Sampling #Creating the Groupings strata1<- rep(1,c(13000)) strata2<- rep(1,c(12000)) strata3<- rep(1,c(4000)) strata4<- rep(1,c(70000)) #Sampling by strata #Selection observations proportional to known population values: Proportionate Sampling set.seed(52524425) srs4<-sample(strata1, size=13, replace=FALSE) ctab(table(srs4)) set.seed(4244225) srs5<-sample(strata2, size=12, replace=FALSE) ctab(table(srs5)) set.seed(33325) srs6<-sample(strata3, size=4, replace=FALSE) ctab(table(srs6)) set.seed(1114225) srs7<-sample(strata4, size=70, replace=FALSE) ctab(table(srs7))

  23. R: Results? Proportional Sampling w/small samples. > srs4<-sample(strata1, size=13, replace=FALSE) > ctab(table(srs4)) Count Total % srs4 1 13 100 > > set.seed(4244225) > srs5<-sample(strata2, size=12, replace=FALSE) > ctab(table(srs5)) Count Total % srs5 1 12 100 > > set.seed(33325) > srs6<-sample(strata3, size=4, replace=FALSE) > ctab(table(srs6)) Count Total % srs6 1 4 100 > > set.seed(1114225) > srs7<-sample(strata4, size=70, replace=FALSE) > ctab(table(srs7)) Count Total % srs7 1 70 100

  24. Proportionate Sampling • What do we see? • If we know the proportions of the relevant stratification variable(s)… • Then sample from the groups. • SMALL SAMPLES can reproduce certain characteristics of the sample. • But of course, it is probabilistic.

  25. Disproportionate Sampling • Why? • “Oversampling” may be of interest when research centers on small pockets in the population. • Race is often an issue in this context.

  26. R: Disproportionate Sampling > #Sampling by strata > #Selection observations disproportional to known population values: disproportionate Sampling > #"Oversampling by Race" > set.seed(5555425) > srs8<-sample(strata1, size=24, replace=FALSE) > ctab(table(srs8)) Count Total % srs8 1 24 100 > > set.seed(4222225) > srs9<-sample(strata2, size=22, replace=FALSE) > ctab(table(srs9)) Count Total % srs9 1 22 100 > > set.seed(103325) > srs10<-sample(strata3, size=14, replace=FALSE) > ctab(table(srs10)) Count Total % srs10 1 14 100 > > set.seed(11534) > srs11<-sample(strata4, size=70, replace=FALSE) > ctab(table(srs7)) Count Total % srs7 1 70 100 >

  27. Disproportionate Samples • What did I ask R to do? • I “oversampled” for some groups. • Again, understand why we, as researchers, might want to do this.

  28. Side-trip: Sample Sizes • Who is happy with a .001 percent SRS? • On the other hand… • What do we get from a stratified sample? • Suppose we increase n in a SRS? • It’s R time!

  29. R: SRS with a 1 percent sample > #Sample Size=1000 > > set.seed(1775233) > srs1<-sample(population, size=1000, replace=FALSE) > ctab(table(srs1)) Count Total % srs1 1 129.0 12.9 2 97.0 9.7 3 46.0 4.6 4 728.0 72.8 > > set.seed(5200563) > srs2<-sample(population, size=1000, replace=FALSE) > ctab(table(srs2)) Count Total % srs2 1 117.0 11.7 2 127.0 12.7 3 41.0 4.1 4 715.0 71.5 > > set.seed(52909) > srs3<-sample(population, size=1000, replace=FALSE) > ctab(table(srs3)) Count Total % srs3 1 147.0 14.7 2 126.0 12.6 3 39.0 3.9 4 688.0 68.8 >

  30. Implications? • Sample Size MATTERS • What do we see? • Note, again, what stratification “buys” us. • The issues with stratification? • Another R example (code posted on website)

  31. R • We have again 4 sample elements • > set.seed(52352) • > urn<-sample(c(1,2,3,4),size=1000, replace=TRUE) • > • > ctab(table(urn)) • Count Total % • urn • 1 239.0 23.9  My Population • 2 253.0 25.3 • 3 268.0 26.8 • 4 240.0 24.0

  32. R version of a person-on-the-street interview > #Convenience Sample: What shows up > > con<-matrixurn[1:10]; con [1] 1 1 1 3 4 2 4 3 4 3 > > ctab(table(con)) Count Total % con 1 3 30 2 1 10 3 3 30 4 3 30

  33. R and Samples, redux • What do we find? • Very unreliable sample: we oversample some groups, undersample others. • Useless data more than likely. • What do you imagine happens when we increase the sample sizes?

  34. R and SRS with samples of size N /*Sample: Sizes 10, 50, 75, 100, 200, 250, 900, 1000*/ set.seed(562) s1<-sample(urn, 10, replace=FALSE) ctab(table(s1)) set.seed(58862) s1a<-sample(urn, 50, replace=FALSE) ctab(table(s1a)) set.seed(562657) s1b<-sample(urn, 75, replace=FALSE) ctab(table(s1b)) set.seed(58862) s2<-sample(urn, 100, replace=FALSE) ctab(table(s2)) set.seed(58862) s3<-sample(urn, 200, replace=FALSE) ctab(table(s3)) set.seed(10562) s4<-sample(urn, 250, replace=FALSE) ctab(table(s4)) set.seed(22562) s5<-sample(urn, 900, replace=FALSE) ctab(table(s5)) set.seed(56882) s6<-sample(urn, 1000, replace=FALSE) ctab(table(s6))

  35. Sampling and Sample Size > /*Sample: Sizes 10, 50, 75, 100, 200, 250, 900, 1000*/ Error: unexpected '/' in "/" > > set.seed(562) > s1<-sample(urn, 10, replace=FALSE) > ctab(table(s1)) Count Total % s1 1 2 20 2 4 40 3 2 20 4 2 20 > > set.seed(58862) > s1a<-sample(urn, 50, replace=FALSE) > ctab(table(s1a)) Count Total % s1a 1 13 26 2 13 26 3 13 26 4 11 22 >

  36. Sample Sizes > > > set.seed(562657) > s1b<-sample(urn, 75, replace=FALSE) > ctab(table(s1b)) Count Total % s1b 1 22.00 29.33 2 18.00 24.00 3 22.00 29.33 4 13.00 17.33 > > set.seed(58862) > s2<-sample(urn, 100, replace=FALSE) > ctab(table(s2)) Count Total % s2 1 27 27 2 24 24 3 22 22 4 27 27 >

  37. Sample Size > set.seed(58862) > s3<-sample(urn, 200, replace=FALSE) > ctab(table(s3)) Count Total % s3 1 54 27 2 48 24 3 48 24 4 50 25 > > > set.seed(10562) > s4<-sample(urn, 250, replace=FALSE) > ctab(table(s4)) Count Total % s4 1 62.0 24.8 2 67.0 26.8 3 56.0 22.4 4 65.0 26.0 >

  38. Sample Size > set.seed(22562) > s5<-sample(urn, 900, replace=FALSE) > ctab(table(s5)) Count Total % s5 1 220.00 24.44 2 231.00 25.67 3 234.00 26.00 4 215.00 23.89 > > set.seed(56882) > s6<-sample(urn, 1000, replace=FALSE) > ctab(table(s6)) Count Total % s6 1 239.0 23.9 2 253.0 25.3 3 268.0 26.8 4 240.0 24.0 > >

  39. R: What did we learn? • Sample size seems to have some impact here. • But there are trade-offs.

  40. Important Moving Parts • Randomness (covered!) • Sampling Frame • Random sampling from a bad sampling frame produces bad samples. • Sample Size • What is your intuition about sample sizes? • Must they always be large? • Not necessarily so…although…

  41. Bad Sampling • Person-on-the-Street Interviews • What do these imply? • Small samples and inherently nonrandom • Likely poor inference. • Other examples? • Not all non-random samples are necessarily bad • Purposive Samples

More Related