Understanding Confidence Intervals and Hypothesis Testing in Biostatistics

Medical Biometry I (Biostatistics 511) Discussion Section Week 8 Mike Garcia Biostat 511

Discussion Outline • Calculating confidence interval for population mean (μ) • When population standard deviation (σ) is known • When population standard deviation (σ) is not known • I have a confidence interval. How do I interpret it? • Hypothesis testing • z-test (σ known) and t-test (σ not known) • p-values • Example and interpretation. • Putting it all together • Connections, more interpretations Biostat 511

Confidence intervals for population mean (population σ known) What we want to know: what is the population mean cholesterol for hypertensive men? What we have: a random sample of 25 hypertensive men and their cholesterol. Knowledge that the population cholesterol standard deviation for hypertensive men is 45 mg/ml The data: 233.47 203.76 204.66 279.39 189.35 227.17 187.55 234.37 234.37 274.89 241.58 160.53 189.35 167.74 205.56 231.67 160.53 266.79 163.23 222.67 202.86 272.19 229.87 219.06 297.40 What would be an estimate of the population mean cholesterol for hypertensive men? Biostat 511

Confidence intervals for population mean (population σ known) What would be an estimate of the population mean cholesterol for hypertensive men? But we would like some measure of uncertainty for this estimate.This is often expressed by a confidence interval. Biostat 511

Confidence intervals for population mean (population σ known) What would be an estimate of the population mean cholesterol for hypertensive men? But we would like some measure of uncertainty for this estimate.This is often expressed by a confidence interval. 95% confidence intervals are most common. Here is the 95% confidence interval calculated from our data How did we get this? Biostat 511

Confidence intervals for population mean (population σ known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean (α = 1-0.95 = 0.05). Plug in values calculated from sample (), from look-up table (z0.975), and already known to us (σ, n) Use calculator or Stata What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? What if the sigma was larger/smaller? Biostat 511

Confidence intervals for population mean (population σ known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean (α = 1-0.95 = 0.05). Plug in values calculated from sample (), from look-up table (z0.975), and already known to us (σ, n) Use calculator or Stata What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? What if the sigma was larger/smaller? It would mean plugging in a larger n, which would make for a tighter CI, i.e. the values would be closer to the sample mean. Biostat 511

Confidence intervals for population mean (population σ known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean (α = 1-0.95 = 0.05). Plug in values calculated from sample (), from look-up table (z0.975), and already known to us (σ, n) Use calculator or Stata What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? What if the sigma was larger/smaller? A 99% CI means larger values for t and thus a wider interval. A 90% CI means smaller values for t and a tighter interval. It would also affect the interpretation. Biostat 511

Confidence intervals for population mean (population σ known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean (α = 1-0.95 = 0.05). Plug in values calculated from sample (), from look-up table (z0.975), and already known to us (σ, n) Use calculator or Stata What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? What if the σwas larger/smaller? Larger σ means a wider interval. Smaller σ means a tighter interval. Makes sense – sampling from less diffuse data should mean less uncertainty. Biostat 511

Confidence intervals for population mean (population σ NOT known) What we want to know: what is the population mean cholesterol for hypertensive men? What we have: a random sample of 25 hypertensive men and their cholesterol. Knowledge that the population cholesterol standard deviation for hypertensive men is 45 mg/ml The data: 233.47 203.76 204.66 279.39 189.35 227.17 187.55 234.37 234.37 274.89 241.58 160.53 189.35 167.74 205.56 231.67 160.53 266.79 163.23 222.67 202.86 272.19 229.87 219.06 297.40 What would be an estimate of the population mean cholesterol for hypertensive men? Biostat 511

Confidence intervals for population mean (population σ NOT known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean. Plug in values calculated from sample (, s), look-up table (t24, 0.975), and already known to us (n) Use calculator or Stata If we drew another sample of the same size, which values would be the same and which would likely change? What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? Biostat 511

Confidence intervals for population mean (population σ NOT known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean. Plug in values calculated from sample (, s), look-up table (t24, 0.975), and already known to us (n) Use calculator or Stata If we drew another sample of the same size, which values would be the same and which would likely change? What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? We’d likely get different values for and s. Our n would remain fixed. Our twould remain the same, assuming we still want a 95% confidence interval. Biostat 511

Confidence intervals for population mean (population σ NOT known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean. Plug in values calculated from sample (, s), look-up table (t24, 0.975), and already known to us (n) Use calculator or Stata If we drew another sample of the same size, which values would be the same and which would likely change? What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? It would mean plugging in a larger n, which would make for a tighter CI, i.e. the values would be closer to the sample mean. Biostat 511

Confidence intervals for population mean (population σ NOT known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean. Plug in values calculated from sample (, s), look-up table (t24, 0.975), and already known to us (n) Use calculator or Stata If we drew another sample of the same size, which values would be the same and which would likely change? What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? A 99% CI means larger values for t and thus a wider interval. A 90% CI means smaller values for t and a tighter interval. Biostat 511

Confidence interval of sample mean - interpretation Scientific collaborator asking statistician some questions: Q: What is your best estimate of the population mean? A: The sample mean! For our sample, it is 220. Q: But how sure are you that it is the population mean? A: I don’t know if it is or not, but I can tell you the 95% confidence interval calculated from our data is (204.07, 235.93) Q: Ok, so there’s a 95% chance that the pop. mean is in that interval right? A: Not quite! The true mean either is or it isn’t in that confidence interval. So we can’t put a probability on it. However, I can tell you that if I were to repeat this experiment over and over again, 95% of the confidence intervals produced will contain the truth. Biostat 511

Hypothesis testing for population mean (population σ known: z-test) • Known facts: • In the general population, men have mean cholesterol of 211 mg/ml with standard deviation 45 mg/ml. • What we want to know: • Do men in the hypertensive population have different mean cholesterol than men in the general population? • What we have: • A random sample of 25 hypertensive men and their cholesterol. Knowledge that the population std. dev. for hypertensive men is the same as that of the general population (45 mg/ml) Biostat 511

Hypothesis testing for population mean (population σknown: z-test) Set up our hypotheses, i.e. the possible “true” scenarios H0: μ=211 Ha: μ>211 or μ<211 Decide on our alpha-level, or Type-I error rate. Typically 5%. α = 0.05 If our data really does not look like it was drawn from a population with μ=211, we will go ahead and say so. More formally: Let’s suppose we lived in a world where hypertensive men actually are the same as everyone else (i.e. H0 is true)! Determine sample mean rejection regions so that when we repeat this experiment over and over, we’d wrongly reject only 5% of the time. How do we determine these rejection regions? Biostat 511

Hypothesis testing for population mean (population σ known : z-test) Still supposing we live in a world where hypertensive men actually are the same as everyone else (i.e. H0 is true): Say we took MANY samples of 25 hypertensive male cholesterols and found the sample mean for each of these samples. A histogram of these millions of sample means: What two-sided rejection region gives us α=0.05, also known as a 5% Type-I error rate? Biostat 511

Hypothesis testing for population mean (population σ known : z-test) Still supposing we live in a world where hypertensive men actually are the same as everyone else (i.e. H0 is true): Say we took MANY samples of 25 hypertensive male cholesterols and found the sample mean for each of these samples. A histogram of these millions of sample means: What two-sided rejection region gives us α=0.05, also known as a 5% Type-I error rate? Reject when <193.36 or >228.64 Biostat 511

Hypothesis testing for population mean (population σ known : z-test) In summary, for a two-sided test with α=0.05, we reject if < μ0+z0.025×or> μ0+z0.975× Where μ0 is the mean under the null hypothesis (in our example, 211) NOTE: Some may be more comfortable operating on the z-score scale. What we did above is mathematically the same thing as using z-scores: And rejecting if Z < z0.025or Z >z0.975 Biostat 511

Hypothesis testing for population mean (population σ known : z-test) The data from our sample: 233.47 203.76 204.66 279.39 189.35 227.17 187.55 234.37 234.37 274.89 241.58 160.53 189.35 167.74 205.56 231.67 160.53 266.79 163.23 222.67 202.86 272.19 229.87 219.06 297.40 Method 1: On the sample mean scale: =220 Our rejection regions were <193.36 and >228.64. Our does not fall in the regions. So we have insufficient evidence to reject the null hypothesis. Method 2: On the Z-score scale: This is not <z0.025 or >z0.975 (<-1.96 or >1.96) Thus we have insufficient evidence to reject the null hypothesis Biostat 511

Hypothesis testing for population mean (population σ NOT known : t-test) • In the previous example, we knew the population sd. What if we don’t? • Known facts: • In the general population, men have mean cholesterol of 211 mg/ml with standard deviation 45 mg/ml. • What we want to know: • Do men in the hypertensive population have different mean cholesterol than men in the general population? • What we have: • A random sample of 25 hypertensive men and their cholesterol. Knowledge that the population sd for hypertensive men is the same as that of the general population (45 mg/ml) Biostat 511

Hypothesis testing for population mean (population σ NOT known: t-test) We set up our hypotheses and α-level just as we did before H0: μ=211 Ha: μ>211 or μ<211 Decide on our α-level, or Type-I error rate. Typically 5%. α = 0.05 If our data really does not look like it was drawn from a population with μ=211, we will go ahead and say so. More formally: Let’s suppose we lived in a world where hypertensive men actually are the same as everyone else (i.e. H0 is true)! Determine sample mean rejection regions so that when we repeat this experiment over and over, we’d wrongly reject only 5% of the time. How do we determine these rejection regions? Biostat 511

Hypothesis testing for population mean (population σ NOT known: t-test) • In the same way, except we replace the z’s with t’s, and the true with the sample std. dev. For a 2-sided test with α=0.05, reject the null when • < μ0+tn-1, 0.025×or> μ0+tn-1, 0.975× • μ0 is the mean under the null hypothesis (in our example, μ0=211) • is the sample standard deviation • What we did above is mathematically the same thing as using t-scores. So alternatively, we can calculate: • And reject the null if • T < tn-1, 0.025 or T > tn-1, 0.975 Biostat 511

One sample t-test example in Stata We can do all of this in Stata using the ttesti command Null mean Sample mean Sample Std. dev. Sample size . ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean < 211 Ha: mean != 211 Ha: mean > 211 Pr(T < t) = 0.8724 Pr(|T| > |t|) = 0.2551 Pr(T > t) = 0.1276 Biostat 511

One sample t-test example in Stata We can do all of this in Stata using the ttesti command Null mean Sample mean Sample Std. dev. Sample size . ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean < 211 Ha: mean != 211 Ha: mean > 211 Pr(T < t) = 0.8724 Pr(|T| > |t|) = 0.2551 Pr(T > t) = 0.1276 In a world where H0 is true, the probability of seeing a sample mean even smaller than the one we observed (<220) is 87.24%. This is a p-value. H0: μ = 211 Ha: μ < 211 Biostat 511

One sample t-test example in Stata We can do all of this in Stata using the ttesti command Null mean Sample mean Sample Std. dev. Sample size . ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean < 211 Ha: mean != 211 Ha: mean > 211 Pr(T < t) = 0.8724 Pr(|T| > |t|) = 0.2551 Pr(T > t) = 0.1276 In a world where H0 is true, the probability of seeing a sample mean even greater than the one we observed (>220) is 12.76%. This is a p-value. H0: μ = 211 Ha: μ > 211 Biostat 511

One sample t-test example in Stata We can do all of this in Stata using the ttesti command Null mean Sample mean Sample Std. dev. Sample size . ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean < 211 Ha: mean != 211 Ha: mean > 211 Pr(T < t) = 0.8724 Pr(|T| > |t|) = 0.2551 Pr(T > t) = 0.1276 In a world where H0 is true, the probability of seeing a sample mean more extreme than the one we observed (>220 or <202) is 25.51%. This is a p-value. H0: μ = 211 Ha: μ ≠ 211 Biostat 511

One sample t-test example in Stata In our sample of cholesterol measurements from 25 hypertensive males, we observed a mean cholesterol of 220 mg/ml (95% CI: 204.07, 235.93). We conduct a two-sided hypothesis test with the null hypothesis that the mean cholesterol of hypertensive males is the same as the mean cholesterol of the general male population using the t-test. Our test resulted in a t-score of 1.17. This does not fall in the two-sided α=0.05 rejection region, so is not a statistically significant result. We thus conclude that we do not have sufficient evidence to reject the null hypothesis. Note that this does not mean the null hypothesis is true, just that we do not have sufficient evidence to rule it out. . ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean < 211 Ha: mean != 211 Ha: mean > 211 Pr(T < t) = 0.8724 Pr(|T| > |t|) = 0.2551 Pr(T > t) = 0.1276 In practice, conclusions/interpretations will not be this wordy. We do so here for thoroughness. Biostat 511

Summary • Some takeaways • Hypothesis testing can be done on the mean scale or the z-scale (t-scale if we don’t know the standard deviation). We can also use p-values (we only covered them in passing here). Another way if we are doing 2-sided testing: just calculate the (1-α)% confidence interval (e.g. 95% CI for α=0.05). If the null mean is not in the interval, reject it. These are all mathematically equivalent. • If we do not reject the null it does not imply the null is true! It simply means we don’t have sufficient evidence to reject it. • What does α =0.05 mean? One overly simplified example: in clinical trials it means we are willing to let through 5% of drugs that have no effect. We don’t know how many drugs have no effect. We just know we are willing to let through 5% of them. Biostat 511

Understanding Confidence Intervals and Hypothesis Testing in Biostatistics