Sampling Techniques

Sampling Techniques MMSI – SATURDAY STUDY SESSION with Mr. Flynn How to choose the right representative sample, increase variability, and reduce bias

The first question is why sample? We’d like to know about an entire population of individuals, but examining all of them is usually impractical, if not impossible. We settle for examining a smaller group of individuals—a sample—selected from the population. Sampling Techniques

We draw samples because we can’t always work with the entire population. We need to be sure that the statistics we compute from the sample reflect the corresponding parameters accurately. A sample that does this is said to be representative. Sampling Techniques Why bother determining the right sample method? Wouldn’t it be better to just include everyone and “sample” the entire population? Such a special sample is called a census.

There are problems with taking a census: It can be difficult to complete a census—there always seem to be some individuals who are hard to locate or hard to measure. Populations rarely stand still. Even if you could take a census, the population changes while you work, so it’s never possible to get a perfect measure. Taking a census may be more complex than sampling. Sampling Techniques Sampling is a natural thing to do. Think about sampling something you are cooking—you taste (examine) a small part of what you’re cooking to get an idea about the dish as a whole; you don’t eat the whole meal!

For example, why do we not want a census in this case? Since education is primarily the responsibility of each state, an educational research agency will survey random people in the state to estimate the number of adult head of households that have earned a high school diploma in the nation. Identify a sampling method that would help achieve this goal. Sampling Techniques

Samples that don’t represent every individual in the population fairly are said to be biased. Bias is the bane of sampling—the one thing above all to avoid. There is usually no way to fix a biased sample and no way to salvage useful information from it. The best way to avoid bias is to select individuals for the sample at random. The value of deliberately introducing randomness is one of the great insights of Statistics. Sampling Techniques

Who’s Who?

Sample Badly with Volunteers: In a voluntary response sample, a large group of individuals is invited to respond, and all who do respond are counted. Voluntary response samples are almost always biased, and so conclusions drawn from them are almost always wrong. Voluntary response samples are often biased toward those with strong opinions or those who are strongly motivated. Since the sample is not representative, the resulting voluntary response bias invalidates the survey. What Can Go Wrong?—or,How to Sample Badly

Sample Badly, but Conveniently: In convenience sampling, we simply include the individuals who are convenient. Unfortunately, this group may not be representative of the population. Convenience sampling is not only a problem for students or other beginning samplers. In fact, it is a widespread problem in the business world—the easiest people for a company to sample are its own customers. What Can Go Wrong?—or,How to Sample Badly (cont.)

Sample from a Bad Sampling Frame: An SRS from an incomplete sampling frame introduces bias because the individuals included may differ from the ones not in the frame. Undercoverage: Many of these bad survey designs suffer from undercoverage, in which some portion of the population is not sampled at all or has a smaller representation in the sample than it has in the population. Undercoverage can arise for a number of reasons, but it’s always a potential source of bias. What Can Go Wrong?—or,How to Sample Badly (cont.)

Watch out for nonrespondents. A common and serious potential source of bias for most surveys is nonresponse bias. No survey succeeds in getting responses from everyone. The problem is that those who don’t respond may differ from those who do. And they may differ on just the variables we care about. What Else Can Go Wrong?

Don’t bore respondents with surveys that go on and on and on and on… Surveys that are too long are more likely to be refused, reducing the response rate and biasing all the results. What Else Can Go Wrong? (cont.)

Work hard to avoid influencing responses. Response bias refers to anything in the survey design that influences the responses. For example, the wording of a question can influence the responses: What Else Can Go Wrong? (cont.)

Look for biases in any survey you encounter—there’s no way to recover from a biased sample of a survey that asks biased questions. Spend your time and resources reducing biases. If you possibly can, pretest your survey. Always report your sampling methods in detail. Use randomization when at all possible. How to Think About Biases

Bias can also arise from poor sampling methods: Voluntary response samples are almost always biased and should be avoided and distrusted. Convenience samples are likely to be flawed for similar reasons. Even with a reasonable design, sample frames may not be representative. Undercoverage occurs when individuals from a subgroup of the population are selected less often than they should be. What have we learned? (cont.)

Random sampling refers to sampling strategies that give every element of a population an equal chance of selection and every group an equal chance of being together. • Strategies include: • simple random sampling (SRS) • systematic sampling • stratified random sampling • and cluster sampling Sampling Techniques

                                        In simple random sampling all elements of a population have an equal chance of inclusion. It is considered ‘fair’, but rarely used in practice because the process demands: identification of all elements of the population; lists of all those elements; and finally a way of randomly selecting from this list. The larger the population, the more difficult to use SRS. Sampling Techniques Assign each square a number from 1 – 40; then use a random number generator to pick apples 6,10,19,21,27,36,40 Can you use playing cards to take a SRS of 40? Activity

An SRS is the standard against which we measure other sampling methods, and the sampling method on which the theory of working with sampled data is based. To select a sample at random, we first need to define where the sample will come from. The sampling frame is a list of individuals/units from which the sample is drawn. Once we have our sampling frame, the easiest way to choose an SRS is with random numbers. Sampling Techniques Samples drawn at random generally differ from one another. Each draw of random numbers selects different units for our sample. These differences lead to different values for the variables we measure. We call these sample-to-sample differences sampling variability.

                                        Systematic sampling involves selecting every nth case within a defined population. It may involve going to every 10th house or selecting every 20th person on a list. It is easier to do than devising methods for random selection, and offers a close approximation of random sampling as long as the elements are randomly ordered. Sampling Techniques From the population, use a random number generator to choose a starting point and sample every 5th apple 5,10,15,20,25,30,35,40

Sometimes we draw a sample by selecting individuals systematically. For example, you might survey every 10th person on an alphabetical list of students. To make it random, you must still start the systematic selection from a randomly selected individual. When there is no reason to believe that the order of the list could be associated in any way with the responses sought, systematic sampling can give a representative sample. Sampling Techniques Systematic sampling can be much less expensive than true random sampling. When you use a systematic sample, you need to justify the assumption that the systematic method is not associated with any of the measured variables.

                                        Stratified random sampling involves dividing your population into various subgroups and then taking a simple random (or systematic) sample within each one. Sampling Techniques After dividing the population into categories, use a random number generator to choose sample apples in each group or strata Group 1: 6,10,12,16,19 Group 2: 1,5,7,14,16

Sometimes the population is first sliced into homogeneous groups, called strata, before the sample is selected. Then simple random sampling is used within each stratum before the results are combined. This common sampling design is called stratified random sampling. Sampling Techniques Stratified random sampling can reduce bias. Stratifying can also reduce the variability of our results. When we restrict by strata, additional samples are more like one another, so statistics calculated for the sampled values will vary less from one sample to another.

                              Cluster sampling involves surveying whole clusters of the population selected through a defined random sampling strategy. The thinking here is that the best way to find high school students is through high schools; or the best way to find church goers is through churches. Sampling Techniques Instead of individual apples, use groups of 3 and use a random number generator to select group of apples Sample group 4, group 15, group 21, and group 29

Sometimes stratifying isn’t practical and simple random sampling is difficult. Splitting the population into similar parts or clusters can make sampling more practical. Then we could select one or a few clusters at random and perform a census within each of them. This sampling design is called cluster sampling. If each cluster fairly represents the full population, cluster sampling will give us an unbiased sample. Sampling Techniques Cluster sampling is not the same as stratified sampling. We stratify to ensure that our sample represents different groups in the population, and sample randomly within each stratum. Strata are homogeneous, but differ from one another. Clusters are more or less alike, each heterogeneous and resembling the overall population. We select clusters to make sampling more practical or affordable.

                                          Sometimes we use a variety of sampling methods together for example this two stage cluster sample. Sampling schemes that combine several methods are called multistage samples. Most surveys conducted by professional polling organizations use some combination of stratified and cluster sampling as well as simple random sampling. Sampling Techniques After selecting a group of apples, Use random number generator to choose one of the 3 apples in group 4, group 15, group 21, and group 29

Non-random sampling refers to strategic requests for ‘volunteers’; the use of informants that ‘snowball’; or ‘hand picking’ respondents Keep in mind that selecting a sample on the basis of convenience alone can threaten a study's credibility Sampling Techniques Sample Badly, but Conveniently: In convenience sampling, we simply include the individuals who are convenient. Unfortunately, this group may not be representative of the population. Convenience sampling is not only a problem for students or other beginning samplers. In fact, it is a widespread problem in the business world—the easiest people for a company to sample are its own customers.

      Volunteer sampling simply refers to the process of selecting a sample by asking for volunteers. This may involve putting an ad in the newspaper or going to local organizations such as churches, schools, or community groups. Volunteer Sampling O'Leary, Z. (2004) The Essential Guide to Doing Research. London: Sage Chapter Eight

Sample Badly with Volunteers: In a voluntary response sample, a large group of individuals is invited to respond, and all who do respond are counted. Voluntary response samples are almost always biased, and so conclusions drawn from them are almost always wrong. Voluntary response samples are often biased toward those with strong opinions or those who are strongly motivated. Since the sample is not representative, the resulting voluntary response bias invalidates the survey. Sampling Techniques

        Handpicked sampling involves selecting cases that meet particular criteria; are considered typical; show wide variance; represent ‘expertise’; or cover a range of possibilities. Other options include the selection of critical, extreme, deviant, or politically important cases. Handpicked Sampling O'Leary, Z. (2004) The Essential Guide to Doing Research. London: Sage Chapter Eight

Watch out for nonrespondents. • A common and serious potential source of bias for most surveys is nonresponse bias. • No survey succeeds in getting responses from everyone. • The problem is that those who don’t respond may differ from those who do. • And they may differ on just the variables we care about. • Work hard to avoid influencing responses. • Response bias refers to anything in the survey design that influences the responses. • For example, the wording of a question can influence the responses:

Sampling Techniques

Identify which type of sampling design is being used in each scenario. A school administrator randomly selects 12 classes from your school and then randomly selects 5 students from each class to study a school library issue. A school administrator uses random numbers to select a sample of 60 students from the roster of students enrolled in your school. A school administrator gets a sample of 60 students from your school by randomly selecting 15 freshmen, 15 sophomores, 15 juniors, and 15 seniors. d. A school administrator uses the roster of students enrolled in your school to select a sample of students by choosing a person randomly from among the first 20 and then taking every 20th name on the roster thereafter. Sampling Techniques

3. Researchers often mark wildlife in order to identify particular individuals across time or space. A study of buttery migration is designed to determine which location on the butteries' wings is best for marking. Because marks in certain locations may be more likely to attract predators or cause problems than marks in other locations, the goal is to determine whether six different marking locations result in equivalent chances of successful migration. To test this, researchers plan to mark 3,600 butteries and release them, then count how many arrive displaying each marking location at the end of the migratory path. (a) Briefly describe a method you could use to assign the marking locations if you wanted to ensure that exactly 600 butteries were marked in each location. (b) Briefly describe a method you could use to assign the marking locations if you wanted to be independent from one buttery to the next, and wanted each location assigned with a probability 1/6 each time. Sampling Techniques

Have we answered all your questions about sampling? If not, what else would you like to know? Why? If yes, then you can now be considered a certifiable, sampling genius! Why? Sampling Techniques

Sampling Techniques