Chapter 4

Chapter 4 Gathering Data

Looking Back • In Chapters 2 & 3 we learned how to describe data both graphically and numerically. • For these statistical analyses to be useful, we must have good data. • In fact, the way a study is designed (how we gather data) can have a major impact on the results of the study. • The purpose of this course is for you to learn what you can conclude about an entire population given a sample from that population. • If a study is poorly designed and implemented, the results may be meaningless or misleading.

Two Scenarios • Study 1 • A U.S. study (2000) compared 469 patients with brain cancer to 422 patients who did not have brain cancer. The patients’ cell phone use was measured using a questionnaire. The two groups’ use of cell phones was similar. • Study 2 • An Australian study (1997) conducted a study with 200 transgenic mice. One hundred were exposed for two 30 minute periods a day to the same kind of microwaves with roughly the same power as the kind transmitted from a cell phone. The other 100 mice were not exposed. After 18 months, the brain tumor rate for the exposed mice was twice as high as that for the unexposed mice. Example taken from Statistics: The Art and Science of Learning from Data

Questions to Consider • How do the two studies differ? • Study 1 • Study 2

Questions to Consider • How do the two studies differ? • Study 1 • No treatments assigned • Patients merely questioned • Study 2

Questions to Consider • How do the two studies differ? • Study 1 • No treatments assigned • Patients merely questioned • Study 2 • Uses mice in hopes of generalizing to humans

Questions to Consider • Why do the results of different medical studies sometimes disagree? • Could the second study be performed on human beings?

Questions to Consider • Why do the results of different medical studies sometimes disagree? • Differing types of studies, data collection or sample frames • Could the second study be performed on human beings?

Questions to Consider • Why do the results of different medical studies sometimes disagree? • Differing types of studies, data collection or sample frames • Could the second study be performed on human beings? • No, because it would be unethical to knowingly expose humans to possibly harmful waves.

Questions to Consider • Suppose a friend recently diagnosed with brain cancer was a frequent cell phone user. Is this strong evidence that frequent cell phone use increases the likelihood of getting brain cancer? • Informal observations of this type are called _____________ _____________. • You should rely on reputable research studies, not anecdotes.

Questions to Consider • Suppose a friend recently diagnosed with brain cancer was a frequent cell phone user. Is this strong evidence that frequent cell phone use increases the likelihood of getting brain cancer? • Informal observations of this type are called anecdotal evidence. • You should rely on reputable research studies, not anecdotes.

Two Main Ways to Gather Data • Observational Study • The researcher observes values of the response and explanatory variables for the sampled subjects without imposing any treatments • Example: • Experiment • The researcher assigns experimental conditions (also called treatments) to subjects (also called experimental units) and then observes outcomes on the response variable. • Treatments correspond to values of the explanatory variable • Example:

Two Main Ways to Gather Data • Observational Study • The researcher observes values of the response and explanatory variables for the sampled subjects without imposing any treatments • Example: Study 1 • Experiment • The researcher assigns experimental conditions (also called treatments) to subjects (also called experimental units) and then observes outcomes on the response variable. • Treatments correspond to values of the explanatory variable • Example:

Two Main Ways to Gather Data • Observational Study • The researcher observes values of the response and explanatory variables for the sampled subjects without imposing any treatments • Example: Study 1 • Experiment • The researcher assigns experimental conditions (also called treatments) to subjects (also called experimental units) and then observes outcomes on the response variable. • Treatments correspond to values of the explanatory variable • Example: Study 2

Advantages of Experiments over Observational Studies • In an observational study, there can always be lurking variables affecting the results. • This means that observational studies can _________ show causation. • It is easier to adjust for lurking variables in an experiment. • In general, we can study the effect of an explanatory variable on a response variable more accurately with an experiment than with an observational study.

Advantages of Experiments over Observational Studies • In an observational study, there can always be lurking variables affecting the results. • This means that observational studies can never show causation. • It is easier to adjust for lurking variables in an experiment. • In general, we can study the effect of an explanatory variable on a response variable more accurately with an experiment than with an observational study.

Disadvantages of Experiments • They can be ____________ to perform on the subjects in which you are interested. • It can be difficult to monitor subjects to ensure that they are doing what they are told. • They can take many years, even decades, to complete. • Results of experiments that use animals do not ______________ to humans. • They are unnecessary when the question of interest does not involve trying to assess _____________.

Disadvantages of Experiments • They can be unethical to perform on the subjects in which you are interested. • It can be difficult to monitor subjects to ensure that they are doing what they are told. • They can take many years, even decades, to complete. • Results of experiments that use animals do not ______________ to humans. • They are unnecessary when the question of interest does not involve trying to assess _____________.

Disadvantages of Experiments • They can be unethical to perform on the subjects in which you are interested. • It can be difficult to monitor subjects to ensure that they are doing what they are told. • They can take many years, even decades, to complete. • Results of experiments that use animals do not generalize to humans. • They are unnecessary when the question of interest does not involve trying to assess _____________.

Disadvantages of Experiments • They can be unethical to perform on the subjects in which you are interested. • It can be difficult to monitor subjects to ensure that they are doing what they are told. • They can take many years, even decades, to complete. • Results of experiments that use animals do not generalize to humans. • They are unnecessary when the question of interest does not involve trying to assess causality.

Example 4.1 • A large study of student drug use and how it depends on drug testing enrolled 76,000 middle and high school students. Each student in the study filled out a questionnaire. One question asked whether the student used drugs. The study found that drug use was not affected by student drug testing. • This is an example of an • Could there be any lurking variables? Example taken from Statistics: The Art and Science of Learning from Data

Example 4.1 • A large study of student drug use and how it depends on drug testing enrolled 76,000 middle and high school students. Each student in the study filled out a questionnaire. One question asked whether the student used drugs. The study found that drug use was not affected by student drug testing. • This is an example of an observational study. • Could there be any lurking variables? Example taken from Statistics: The Art and Science of Learning from Data

Example 4.1 • A large study of student drug use and how it depends on drug testing enrolled 76,000 middle and high school students. Each student in the study filled out a questionnaire. One question asked whether the student used drugs. The study found that drug use was not affected by student drug testing. • This is an example of an observational study. • Could there be any lurking variables? • Frequency of drug testing, whether testing is random, etc. Example taken from Statistics: The Art and Science of Learning from Data

Example 4.2 • A researcher buys seeds of two different varieties of corn. He randomly selects 30 seeds of each variety and plants them in his backyard, making sure to label the location of each seed and its type. He then measures how long it takes each seed to sprout. At the end of the study he compares the average germination time of the different varieties. • This is an example of an • Could there be any lurking variables? Used with permission from Dr. Ellen Toby

Example 4.2 • A researcher buys seeds of two different varieties of corn. He randomly selects 30 seeds of each variety and plants them in his backyard, making sure to label the location of each seed and its type. He then measures how long it takes each seed to sprout. At the end of the study he compares the average germination time of the different varieties. • This is an example of an experiment. • Could there be any lurking variables? Used with permission from Dr. Ellen Toby

Example 4.2 • A researcher buys seeds of two different varieties of corn. He randomly selects 30 seeds of each variety and plants them in his backyard, making sure to label the location of each seed and its type. He then measures how long it takes each seed to sprout. At the end of the study he compares the average germination time of the different varieties. • This is an example of an experiment. • Could there be any lurking variables? • Soil quality, temperature Used with permission from Dr. Ellen Toby

Example 4.3 • A researcher has seeds of only one variety of tomato. She has 60 nearly identical pots of soil and plants one tomato seed in each. She randomly selects 30 pots and keeps them at 75° F. The other 30 pots she keeps at 65° F. Aside from temperature, she provides the same growing conditions to all pots. She then measures how long it takes for the seeds to sprout. At the end of the study she compares the average germination time of the different temperature groups. • This is an example of an • Are there any lurking variables? Used with permission from Dr. Ellen Toby

Example 4.3 • A researcher has seeds of only one variety of tomato. She has 60 nearly identical pots of soil and plants one tomato seed in each. She randomly selects 30 pots and keeps them at 75° F. The other 30 pots she keeps at 65° F. Aside from temperature, she provides the same growing conditions to all pots. She then measures how long it takes for the seeds to sprout. At the end of the study she compares the average germination time of the different temperature groups. • This is an example of an experiment. • Are there any lurking variables? Used with permission from Dr. Ellen Toby

Example 4.3 • A researcher has seeds of only one variety of tomato. She has 60 nearly identical pots of soil and plants one tomato seed in each. She randomly selects 30 pots and keeps them at 75° F. The other 30 pots she keeps at 65° F. Aside from temperature, she provides the same growing conditions to all pots. She then measures how long it takes for the seeds to sprout. At the end of the study she compares the average germination time of the different temperature groups. • This is an example of an experiment. • Are there any lurking variables? • No, everything has been controlled here. Used with permission from Dr. Ellen Toby

Types of Observational Studies • Retrospective • Observational studies that look back in time • This is sometimes done to find risk factors for certain diseases • Cross-Sectional • Observational studies that take a cross section of the population at the current time • Prospective • Observational studies in which subjects are followed into the future

Sampling Designs for Observational Studies • Simple Random Sampling (SRS) • A simple random sample of n subjects from a population is one in which each possible sample of that size has the _______ chance of being selected.

Sampling Designs for Observational Studies • Simple Random Sampling (SRS) • A simple random sample of n subjects from a population is one in which each possible sample of that size has the same chance of being selected.

Sampling Designs for Observational Studies • Stratified Sampling • A stratified random sample divides the population into separate groups, called strata, and then selects an SRS of _________ from each stratum.

Sampling Designs for Observational Studies • Stratified Sampling • A stratified random sample divides the population into separate groups, called strata, and then selects an SRS of subjects from each stratum.

Sampling Designs for Observational Studies • Cluster Sampling • A cluster random sample can be used if the target population naturally divides into groups, each of which is representative of the entire target population. In this method, a SRS of ________(or strata) is taken. Every member of the selected groups is put into the sample.

Sampling Designs for Observational Studies • Cluster Sampling • A cluster random sample can be used if the target population naturally divides into groups, each of which is representative of the entire target population. In this method, a SRS of groups (or strata) is taken. Every member of the selected groups is put into the sample.

Sampling Designs for Observational Studies • Systematic Sampling • A systematic sample selects every kth person from the sample frame. The researcher randomly selects a number between 1 and k in order to know which person to select first, then selects every kth person after this.

Advantages of the Various Sampling Designs • Simple Random Sampling (SRS) • It is the easiest most widespread form of sampling. • Each subject has an _______ chance to be in the sample. • The sample enables us to determine how likely it is that descriptive statistics (like the sample mean) fall close to corresponding values for which we would like to make inference (like the population mean).

Advantages of the Various Sampling Designs • Simple Random Sampling (SRS) • It is the easiest most widespread form of sampling. • Each subject has an equal chance to be in the sample. • The sample enables us to determine how likely it is that descriptive statistics (like the sample mean) fall close to corresponding values for which we would like to make inference (like the population mean).

Advantages of the Various Sampling Designs • Stratified Sampling • It ensures that there are enough _________ in each group that you want to compare. • Cluster Sampling • It does not require a sampling frame of subjects. • It is less ___________ to implement.

Advantages of the Various Sampling Designs • Stratified Sampling • It ensures that there are enough subjects in each group that you want to compare. • Cluster Sampling • It does not require a sampling frame of subjects. • It is less ___________ to implement.

Advantages of the Various Sampling Designs • Stratified Sampling • It ensures that there are enough subjects in each group that you want to compare. • Cluster Sampling • It does not require a sampling frame of subjects. • It is less expensive to implement.

Bias in Sampling • A sampling method is _________ if • The sample tends to favor some parts of the population over others. • In other words, the results from the sample are not representative of the population. • Obviously, __________ samples are our goal.

Bias in Sampling • A sampling method is biased if • The sample tends to favor some parts of the population over others. • In other words, the results from the sample are not representative of the population. • Obviously, __________ samples are our goal.

Bias in Sampling • A sampling method is biased if • The sample tends to favor some parts of the population over others. • In other words, the results from the sample are not representative of the population. • Obviously, unbiased samples are our goal.

Types of Bias • Undercoverage • Occurs when a sampling frame leaves out some groups in the population • Nonresponse bias • Occurs when some sampled subjects cannot be reached, refuse to participate or fail to answer some questions • Response bias • Occurs when the subject gives an incorrect response or when the question wording or the way the interviewer asks the questions is confusing or misleading

Examples of Poor Samples that Result in Bias • Convenience Samples • Voluntary Response Samples

Examples of Poor Samples that Result in Bias • Convenience Samples • Sampling friends • Sampling at the mall • Voluntary Response Samples

Examples of Poor Samples that Result in Bias • Convenience Samples • Sampling friends • Sampling at the mall • Voluntary Response Samples • Internet surveys • Call-in surveys

Example 4.4 • In 1997 in her book Women and Love, Shere Hite presented results of a survey mailed to 100,000 women in the United States. One of her conclusions was that 70% of women who had been married at least five years have extramarital affairs. She based this conclusion on the replies of only 4500 women. • This is an example of Example taken from Statistics: The Art and Science of Learning from Data

Chapter 4

Chapter 4

Presentation Transcript

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4-4

Chapter 4

Chapter 4

Chapter 4 - 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

Chapter 4