1 / 127

Introduction to Biostatistics for Clinical Researchers

Introduction to Biostatistics for Clinical Researchers. University of Kansas Department of Biostatistics & University of Kansas Medical Center Department of Internal Medicine. Schedule. Friday, December 3 in 1023 Orr-Major Friday, December 10 in 1023 Orr-Major

zuzana
Télécharger la présentation

Introduction to Biostatistics for Clinical Researchers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Biostatistics for Clinical Researchers University of Kansas Department of Biostatistics & University of Kansas Medical Center Department of Internal Medicine

  2. Schedule Friday, December 3 in 1023 Orr-Major Friday, December 10 in 1023 Orr-Major Friday, December 17 in B018 School of Nursing Possibility of a 5th lecture, TBD All lectures will be held from 8:30a - 10:30a

  3. Materials • PowerPoint files can be downloaded from the Department of Biostatistics website at http://biostatistics.kumc.edu • A link to the recorded lectures will be posted in the same location

  4. Sampling Variability and Confidence Intervals

  5. Topics • Sampling distribution of a sample mean • Variability in the sampling distribution • Standard error of the mean • Standard error versus standard deviation • Confidence intervals for the population mean μ • Sampling distribution of a sample proportion • Standard error and confidence intervals for a proportion

  6. The Random Sampling Behavior of a Sample Mean Across Multiple Random Samples

  7. Random Sample • When a sample is randomly selected from a population, it is called a random sample • Technically speaking, values in a random sample are representative of the distribution of the values in the population, regardless of size • In a simple random sample, each individual in the population has an equal chance of being chosen for the sample • Random sampling helps control systematic bias • Even with random sampling, there is still sampling variability or error

  8. Sampling Variability of a Sample Statistic • If we repeatedly choose samples from the same population, a statistic will take different values in different samples • If the statistic does not change much from sample to sample, then it is fairly reliable (does not have a lot of variability)

  9. Example: Blood Pressure of Males • Recall, we had worked with data on blood pressures using a random sample of 113 men taken from the population of all men • Assume the population distribution is given by the following:

  10. Example: Blood Pressure of Males • Suppose we had all the time in the world • We decide to do an experiment • We are going to take 500 separate random samples from this population of men, each with 20 subjects • For each of the 500 samples, we will plot a histogram of the sample BP values and record the sample mean and sample standard deviation

  11. Random Samples • Sample 1: n = 20 • Sample 2: n = 20

  12. Example: Blood Pressure of Males • We did this 500 times—let’s look at a histogram of the 500 sample means

  13. Example: Blood Pressure of Males • We decide to do another experiment • We are going to take 500 separate random samples from this population of men, each with 50 subjects • For each of the 500 samples, we will plot a histogram of the sample BP values and record the sample mean and sample standard deviation

  14. Random Samples • Sample 1: n = 50 • Sample 2: n = 50

  15. Example: Blood Pressure of Males • We did this 500 times—now let’s look at a histogram of the 500 sample means

  16. Example: Blood Pressure of Males • We decide to do one more experiment • We are going to take 500 separate random samples from this population of men, each with 100 subjects • For each of the 500 samples, we will plot a histogram of the sample BP values and record the sample mean and sample standard deviation

  17. Random Samples • Sample 1: n = 100 • Sample 2: n = 100

  18. Example: Blood Pressure of Males • We did this 500 times—lets look at a histogram of the 500 sample means

  19. Example: Blood Pressure of Males • Let’s review the results • Population distribution of individual BP measurements for males is normal • μ = 125 mmHg; σ = 14 mmHg • Results from 500 random samples:

  20. Example: Blood Pressure of Males • Let’s review the results

  21. Example: Hospital Length of Stay • Recall, we had worked with the data on length of stay (LOS) using a random sample of 500 patients taken from all patients discharged in 2005 • Assume the population distribution is given by the following:

  22. Example: Hospital Length of Stay • Boxplot

  23. Example: Hospital Length of Stay • Suppose we had all the time in the world, again • We decide to do another set of experiments • We are going to take 500 separate random samples from this population of patients, each with 20 subjects • For each of the 500 samples we will plot a histogram of the sample LOS values and record the sample mean and standard deviation

  24. Random Samples • Sample 1: n = 20 • Sample 2: n = 20

  25. Example: Hospital Length of Stay • We did this 500 times—let’s look at a histogram of the 500 sample means

  26. Example: Hospital Length of Stay • Suppose we had all the time in the world, again • We decide to do another experiment • We are going to take 500 separate random samples from this population of patients, each with 50 subjects • For each of the 500 samples we will plot a histogram of the sample LOS values and record the sample mean and standard deviation

  27. Random Samples • Sample 1: n = 50 • Sample 2: n = 50

  28. Example: Hospital Length of Stay • We did this 500 times—lets look at a histogram of the 500 sample means

  29. Example: Hospital Length of Stay • Suppose we had all the time in the world, again • We decide to do one more experiment • We are going to take 500 separate random samples from this population of patients, each with 100 subjects • For each of the 500 samples we will plot a histogram of the sample LOS values and record the sample mean and standard deviation

  30. Random Samples • Sample 1: n = 100 • Sample 2: n = 100

  31. Example: Hospital Length of Stay • We did this 500 times—lets look at a histogram of the 500 sample means

  32. Example: Hospital Length of Stay • Let’s review the results • Population distribution of individual LOS values for population of patients is right skewed • μ = 5.05 days; σ = 6.90 days • Results from 500 random samples:

  33. Example: Hospital Length of Stay • Let’s review the results

  34. Summary • What did we see across the two examples? • A few trends: • Distribution of sample means tended to be approximately normal, even with the original individual level data was not (LOS) • Variability in the sample mean values decreased as the size of the sample of each mean was based upon increased • Distribution of sample means was centered at true population mean

  35. Clarification • Variation in the sample mean values is tied to the size of each sample selected in our exercise (i.e., 20, 50, or 100), not to the number of samples (i.e., 500)

  36. The Theoretical Sampling Distribution of the Sample Mean and Its Estimate Based on a Single Sample

  37. Sampling Distribution of the Sample Mean • In the previous section we reviewed the results of simulations that resulted in estimates of what’s formally called the sampling distribution of the sample mean • The sampling distribution of the sample mean is a theoretical probability distribution • It describes the distribution of sample means from all possible random samples of the same size taken from a population

  38. Sampling Distribution of the Sample Mean • For example, the histogram below is an estimate of the sampling distribution of sample BP means based on random samples of n = 50 from the population of all men

  39. Sampling Distribution of the Sample Mean • In research, it is impossible to estimate the sampling distribution of a sample mean by actually taking many random samples from the same population • No research would ever happen if a study needed to be repeated multiple times to understand this sampling behavior • Simulations are useful to illustrate a concept, but not to highlight a practical approach • Luckily, there is some mathematical machinery that generalizes some of the patterns we saw in the previous simulation results

  40. The Central Limit Theorem (CLT) • The Central Limit Theorem is a powerful mathematical tool that gives several useful results • The sampling distribution of sample means based on all samples of size n is approximately normal, regardless of the distribution of the original, individual-level data in the population/sample • The mean of all sample means in the sampling distribution is the true mean of the population from which the samples were taken (μ) • The standard deviation of the sample means taken from samples of size n is equal to : this is often called the standard error of the mean,

  41. Example: Blood Pressure of Males • The population distribution of individual BP measurements for males is normal with μ = 125 mmHg and σ = 14 mmHg

  42. Example: Blood Pressure of Males • The population distribution of individual BP measurements for males is normal with μ = 125 mmHg and σ = 14 mmHg

  43. Example: Blood Pressure of Males • The population distribution of individual BP measurements for males is normal with μ = 125 mmHg and σ = 14 mmHg

  44. Recap: CLT • The CLT tells us: • When taking a random sample of continuous measures of size n from a population with true mean μ and true standard deviation σ, the theoretical sampling distribution of sample means from all possible random samples of size n is as follows:

  45. CLT: So What? • So what good is this information? • Using the properties of the normal curve, this shows that for most random samples we can take (i.e., 95% of them), the sample mean will fall within 2 SE of the true mean (actually, 1.96 SE)

  46. CLT: So What? • AGAIN, what good is this information? • We are going to take a single sample of size n and get one • We won’t know μ, and if we did know μ why would we care about the distribution of estimates of μ from imperfect subsets of the population?

  47. CLT: So What? • We are going to take a single sample of size n and get one • But for most (i.e., 95%) of the random samples we can get, our will fall within ± 1.96 SE of μ

  48. CLT: So What? • We are going to take a single sample of size n and get one • So if we start at and go 1.96 SE in either direction, the interval created will contain μ most (i.e., 95%) of the time

  49. Estimating a Confidence Interval • Such an interval is called a 95% confidence interval for the population mean μ • The interval is given by the formula: • Problem: we don’t know σ, either! • We can estimate it with s, and will detail this in the next section • What is the interpretation of a confidence interval?

  50. Interpretation of a 95% Confidence Interval (CI) • Laypersons’ range of plausible values for the true mean • Researchers never can observe the true mean μ • is the best estimate based on a single sample • The 95% CI starts with this best estimate and additionally recognizes uncertainty in this quantity • Technical interpretation: • Were 100 random samples of size n taken from the same population and 95% confidence interval limits computed from each of these 100 samples, 95 of the 100 intervals would contain the value of the true mean μ

More Related