Hypothesis Testing: p-value

STAT 101 Dr. Kari Lock Morgan Hypothesis Testing: p-value • SECTION 4.2 • Randomization distribution • p-value

Paul the Octopus http://www.youtube.com/watch?v=3ESGpRUMj9E

Hypotheses • In 2008, Paul the Octopus predicted 8 World Cup games, and predicted them all correctly • Is this evidence that Paul’s chance of guessing correctly, p, is really greater than 50%? • What are the null and alternative hypotheses? • H0: p ≠ 0.5, Ha: p = 0.5 • H0: p = 0.5, Ha: p ≠ 0.5 • H0: p = 0.5, Ha: p > 0.5 • H0: p > 0.5, Ha: p = 0.5

Key Question • If it is very unusual, we have statistically significantevidence against the null hypothesis • Today’s Question: How do we measure how unusual a sample statistic is, if H0 is true? How unusual is it to see a sample statistic as extreme as that observed, if H0 is true?

Measuring Evidence against H0 To see if a statistic provides evidence against H0, we need to see what kind of sample statistics we would observe, just by random chance, if H0 were true

Paul the Octopus • We need to know what kinds of statistics we would observe just by random chance, if the null hypothesis were true • How could we figure this out??? Simulate many samples of size n = 8 with p = 0.5

Simulate! • We can simulate this with a coin! • Each coin flip = a guess between two teams (Heads = correct, Tails = incorrect) • Flip a coin 8 times, count the number of heads, and calculate the sample proportion of heads • Did you get all 8 heads (correct)? (a) Yes (b) No • How extreme is Paul’s sample proportion of 1?

Paul the Octopus • Based on your simulation results, for a sample size of n = 8, do you think is statistically significant? • Yes • No

Randomization Distribution A randomization distribution is a collection of statistics from samples simulated assuming the null hypothesis is true • The randomization distribution shows what types of statistics would be observed, just by random chance, if the null hypothesis were true

Lots of simulations! • For a better randomization distribution, we need many more simulations! www.lock5stat.com/statkey

Randomization Distribution

Paul the Octopus • Based on StatKey’s simulation results, for a sample size of n = 8, do you think is statistically significant? • Yes • No

Key Question • A randomization distribution tells us what kinds of statistics we would see just by random chance, if the null hypothesis is true • This makes it straightforward to assess how extreme the observed statistic is! How unusual is it to see a sample statistic as extreme as that observed, if H0 is true?

Randomization Distribution In a hypothesis test for H0:  = 12 vsHa:  < 12, we have a sample with n = 45 and . What do we require about the method to produce randomization samples? •  = 12 •  < 12 We need to generate randomization samples assuming the null hypothesis is true.

Randomization Distribution In a hypothesis test for H0:  = 12 vsHa:  < 12, we have a sample with n = 45 and . Where will the randomization distribution be centered? • 10.2 • 12 • 45 • 1.8 Randomization distributions are always centered around the null hypothesized value.

Randomization Distribution Center A randomization distribution is centered at the value of the parameter given in the null hypothesis. • A randomization distribution simulates samples assuming the null hypothesis is true, so

Randomization Distribution In a hypothesis test for H0:  = 12 vsHa:  < 12, we have a sample with n = 45 and . What will we look for on the randomization distribution? • How extreme 10.2 is • How extreme 12 is • How extreme 45 is • What the standard error is • How many randomization samples we collected We want to see how extreme the observed statistic is.

Randomization Distribution In a hypothesis test for H0: 1= 2vsHa: 1> 2, we have a sample with and . What do we require about the method to produce randomization samples? • 1 = 2 • 1 > 2 • 26, 21 We need to generate randomization samples assuming the null hypothesis is true.

Randomization Distribution In a hypothesis test for H0: 1= 2vsHa: 1> 2, we have a sample with and . Where will the randomization distribution be centered? • 0 • 1 • 21 • 26 • 5 The randomization distribution is centered around the null hypothesized value, 1- 2 = 0

Randomization Distribution In a hypothesis test for H0: 1= 2vsHa: 1> 2, we have a sample with and . What do we look for on the randomization distribution? • The standard error • The center point • How extreme 26 is • How extreme 21 is • How extreme 5 is We want to see how extreme the observed difference in means is.

Quantifying Evidence • We need a way to quantify evidence against the null…

p-value The p-value is the chance of obtaining a sample statistic as extreme (or more extreme) than the observed sample statistic, if the null hypothesis is true • The p-value can be calculated as the proportion of statistics in a randomization distribution that are as extreme (or more extreme) than the observed sample statistic

p-value • Paul the Octopus: the p-value is the chance of getting all 8 out of 8 guesses correct, if p = 0.5 • What proportion of statistics in the randomization distribution are as extreme as ?

1000 Simulations p-value = 0.004 Proportion as extreme as observed statistic p-value • If Paul is just guessing, the chance of him getting all 8 correct is 0.004. observed statistic

Calculating a p-value • What kinds of statistics would we get, just by random chance, if the null hypothesis were true? (randomization distribution) • What proportion of these statistics are as extreme as our original sample statistic? • (p-value)

ESP p-value • For our ESP example, the p-value is the chance of getting a sample proportion as high as 0.26, from a sample of n = 98, if p = 0.2 • Simulate a randomization distribution with p= 0.2 and n = 98, and see what proportion of simulated statistics are as extreme as 0.26 • www.lock5stat.com/statkey

ESP p-value • If you were all just guessing randomly, the chance of us getting a sample proportion as high as 0.26 is 0.072. p-value = 0.072 Proportion as extreme as observed statistic p-value observed statistic

Randomization Distributions • p-values can be calculated by randomization distributions: • simulate samples, assuming H0 is true • calculate the statistic of interest for each sample • find the p-value as the proportion of simulated statistics as extreme as the observed statistic • Let’s do a randomization distribution for a randomized experiment…

Cocaine Addiction • In a randomized experiment on treating cocaine addiction, 48 people were randomly assigned to take either Desipramine (a new drug), or Lithium (an existing drug), and then followed to see who relapsed • Question of interest: Is Desipramine better than Lithium at treating cocaine addiction?

Cocaine Addiction • What are the null and alternative hypotheses? • What are the possible conclusions? pD, pL: proportion of cocaine addicts who relapse after taking Desipramine or Lithium, respectively H0: pD = pL Ha: pD < pL Reject H0; Desipramine is better than Lithium Do not reject H0: We cannot determine from these data whether Desipramine is better than Lithium

R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R 1. Randomly assign units to treatment groups Desipramine Lithium R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R

2. Conduct experiment 3. Observe relapse counts in each group R = Relapse N = No Relapse 1. Randomly assign units to treatment groups Desipramine Lithium R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R N R N R R R R R R R R R R R R R N R N N N N N N N N N R R R R R R R R R R R R N N N N N N N N N N N N N N N N N N N N N N N N 10 relapse, 14 no relapse 18 relapse, 6 no relapse

Measuring Evidence against H0 To see if a statistic provides evidence against H0, we need to see what kind of sample statistics we would observe, just by random chance, if H0 were true

Cocaine Addiction • “by random chance” means by the random assignment to the two treatment groups • “if H0 were true” means if the two drugs were equally effective at preventing relapses (equivalently: whether a person relapses or not does not depend on which drug is taken) • Simulate what would happen just by random chance, if H0 were true…

R R R R R R R R R R R R R R R R N N R R R R R R N N N N N N R R R R R R N N N N N N N N N N N N 10 relapse, 14 no relapse 18 relapse, 6 no relapse

R R R R R R R R R R R R R R R R N N R R R R R R N N N N N N R R R R R R N N N N N N N N N N N N Simulate another randomization Desipramine Lithium R N R N N N N R R R R R R R N R R N N N R N R R R N N R N R R N R N N N R R R N R R R R 16 relapse, 8 no relapse 12 relapse, 12 no relapse

Simulate another randomization Desipramine Lithium R R R R R R R R R R R R R N R R N N R R R R R R R R N R N R R R R R R R R N R N R R N N N N N N 17 relapse, 7 no relapse 11 relapse, 13 no relapse

www.lock5stat.com/statkey Proportion as extreme as observed statistic p-value observed statistic • If the two drugs are equal regarding cocaine relapse rates, we have a 1.3% chance of seeing a difference in proportions as extreme as that observed.

Death Penalty • A random sample of people were asked “Are you in favor of the death penalty for a person convicted of murder?” • Did the proportion of Americans who favor the death penalty decrease from 1980 to 2010? “Death Penalty,” Gallup, www.gallup.com

Death Penalty How extreme is 0.02, if p1980 = p2010? p1980 , p2010: proportion of Americans who favor the death penalty in 1980, 2010 H0: p1980 = p2010 Ha: p1980> p2010 So the sample statistic is: StatKey

Death Penalty p– value = 0.164 If proportion supporting the death penalty has not changed from 1980 to 2010, we would see differences this extreme about 16% of the time.

Alternative Hypothesis • A one-sided alternative contains either > or < • A two-sidedalternative contains ≠ • The p-value is the proportion in the tail in the direction specified by Ha • For a two-sided alternative, the p-value is twice the proportion in the smallest tail

p-value and Ha H0:  = 0 Ha:  > 0 Upper-tail (Right Tail) H0:  = 0 Ha:  < 0 Lower-tail (Left Tail) H0:  = 0 Ha:  ≠ 0 Two-tailed

Sleep versus Caffeine • Recall the sleep versus caffeine experiment from last class • s and c are the mean number of words recalled after sleeping and after caffeine. • H0: s = c Ha: s ≠ c • Let’s find the p-value! • www.lock5stat.com/statkey Two-tailed alternative

Sleep or Caffeine for Memory? www.lock5stat.com/statkey p-value = 2 × 0.022 = 0.044

p-value and H0 • If the p-value is small, then a statistic as extreme as that observed would be unlikely if the null hypothesis were true, providing significant evidence against H0 • The smaller the p-value, the stronger the evidence against the null hypothesis and in favor of the alternative

p-value and H0 The smaller the p-value, the stronger the evidence against Ho. The smaller the p-value, the stronger the evidence against Ho. The smaller the p-value, the stronger the evidence against Ho.

Summary • The randomization distribution shows what types of statistics would be observed, just by random chance, if the null hypothesis were true • A p-value is the chance of getting a statistic as extreme as that observed, if H0 is true • A p-value can be calculated as the proportion of statistics in the randomization distribution as extreme as the observed sample statistic • The smaller the p-value, the greater the evidence against H0

To Do • Read Section 4.2 • Project 1 proposal (due Wednesday, 2/19)

Hypothesis Testing: p-value