Understanding Errors in Hypothesis Testing and Power Analysis

Chapter 9 Power

Decisions • A null hypothesis significance test tells us the probability of obtaining our results when the null hypothesis is true p(Results|Ho is True) • If that probability is small, smaller than our significance level (α), it is probable that HO is not true and we reject it

Errors in Hypothesis Testing • Sometimes we make the correct decision regarding HO • Sometimes we make mistakes when conducting hypothesis tests • Remember: we are talking about probability theory • Less than a .05 chance, doesn’t mean “no chance at all”

Errors in Hypothesis Testing

Type 1 Errors • The null hypothesis is correct (in reality) but we have rejected it in favor of the alternative hypothesis • The probability of making a Type 1 error is equal to α, the significance level we have selected • α - the probability of rejecting a null hypothesis when it is true

Type 2 Errors • The null hypothesis is incorrect, but we have failed to reject it in favor of the alternative hypothesis • The probability of a type 2 error is signified by β, and the “power” of a statistical test is 1 - β • Power (1- β) - the probability of rejecting a null hypothesis when it is false

More on α and β

Relation between α and β • Although they are related, the relation is complex • If α = .05, the probability of making a correct decision when the null hypothesis is true is 1 – α = .95 • What if the null hypothesis is not true? • The probability of rejecting the null when it is not true is 1 - β

Relation between α and β • In general, we do not set β, but it is a direct outcome of our experiment and can be determined (we can estimate βby designing our experiment properly) • β is generally greater than α • One way to decrease β is by increasing α • But, we don’t want to do that. Why, you ask?

α and β reconsidered • Minimize chances of finding an innocent man guiltyvs. finding a guilty man innocent • Likewise, we should reduce the likelihood of finding an effect when there isn’t one (making a type 1 error - reject HOwhen HO is true), vs. decreasing the likelihood of missing an effect when there is one (making a type 2 error - not rejecting HO when HO is false)

Power? • The probability of rejecting a false null hypothesis • The probability of making a correct decision (one type of) • Addresses the type 2 error: “Not finding any evidence of an effect when one is there”

More (on) Power • While most focus on type 1 errors, you can’t be naïve (anymore) to type 2 errors, as well • Thus, power analyses are becoming the norm in psychological statistics (or they should be)

Hypothesis testing & Power Sampling distribution of the sample mean, when HO is true μ specified in HO

HO: μ =0 0 M Our sample mean

HO: μ=0 0 The probability of obtaining our sample mean (or less) given that the null hypothesis is true M Our sample mean

HO: μ=0 0 We reject the null that our sample came from the distribution specified by HO, because if it were true, our sample mean would be highly improbable, M Our sample mean

HO: μ=0 0 Improbable means “not likely” but not “impossible”, so the probability that we made an error and rejected HO when it was true is this area OOPS! M Our sample mean

HO: μ=0 0 This area is our “p-value” and as long as it is less than α, we reject HO M Our sample mean

HO: μ=0 0 As a reminder and a little “visual” help, α defines the critical value and the rejection region Rejection Region Critical Value

HO: μ=0 0 Any sample mean that falls within the rejection region (< and/or > the critical value(s)), we will reject HO Rejection Region Critical Value

Let’s say, though, that our sample mean is really from a different distribution than specified by HO, one that’s consistent with HA Rejection Region

We assume that this second sampling distribution consistent with HA, is normally distributed around our sample mean Our M Rejection Region

If HO is false, the probability of rejecting then, is the area under the second distribution that’s part of the rejection region Rejection Region

Namely, this area Rejection Region

And, we all know the probability of rejecting a false HO is POWER POWER Rejection Region

POWER 1-β β Rejection Region

1-α α Rejection Region

Factors that influence power: α POWER Rejection Region

Factors that influence power: variability Power Rejection Region

Factors that influence power: sample size Power Rejection Region

Factors that influence power: effect size (this difference is increased) Power Rejection Region

Factors that Influence Power • α- significance level (the probability of making a type 1 error)

Parametric Statistical Tests • Parametric statistical tests, those that test hypotheses about specific population parameters, are generally more powerful than corresponding non-parametric tests • Therefore, parametric tests are preferred to non-parametric tests, when possible

Variability • Measure more accurately • Design a better experiment • Standardize procedures for acquiring data • Use a dependent-sample

Directional Alternative Hypothesis • A directional HA specifies which tail of the distribution is of interest (e.g., HA is specified as < or > some value rather than “different than” or ≠ )

Increasing Sample Size (n) • σM, the standard error of the mean, decreases with increases in sample size

Increasing Sample size n=400, σM= 0.5 n=100, σM= 1.0 n=25, σM= 2.0

Effect Size • Effect size is directly related to power

Effect Size • Effect size - measure of the magnitude of the effect of the intervention being studied • Effect is related to the magnitude of the difference between a hypothesized mean (what we might think it is given the intervention) and the population mean (μ)

Cohen’s d • .2 = small effect • .5 = moderate effect • .8 = large effect • For each statistical test, separate formulae are needed to determine d, but • When you do this, results are directly comparable regardless of the test used

Implications of Effect Size • A study was conducted by Dr. Johnson on productivity in the workplace • He compared Method A with Method B • Using an n = 80, Johnson found that A was better than B at p < .05 • (he rejected the null that A and B were identical, and accepted the directional alternative that A was better)

Implications (cont.) • Dr. Sockloff, who invented Method B, disputed these claims and repeated the study • Using an n = 20, Sockloff found no difference between A and B at p > .30 • (he did not reject the null that A and B were equal)

How can this be? • In both cases the effect size was determined to be .5 (the effectiveness of Method A was identical in both studies) • However, Johnson could detect an effect because he had the POWER • Sockloff had very low power, and did not detect an effect (he had a low probability of rejecting an incorrect null)

Power and Effect Size • A desirable level of power is .80 (Cohen, 1965) • Thus, β = .20 • And, by setting an effect size (the magnitude of the smallest discrepancy that, if it exists, we would be reasonably sure of detecting) • We can find an appropriate n (sample size)

Method for Determining Sample Size (n) • A priori, or before the study • Directional or Non-Directional? • Set significance level, α • What level of power do we want? • Use table B to look up δ(“delta”) • Determine effect size and use: n = (δ/d)2

Example of Power Analysis • α= .05 • 1-β= .80 • look up in table B, δ= 2.5 • d = .5 (moderate effect) • n = (δ/d)2 = (2.5/.5)2 = 25 • So, in order to detect a moderate effect (.5) with power of .80 and αof .05, we need 25 subjects in our study

***Main Point*** (impress your Research Methods prof) • Good experimental design always utilizes power and effect size analyses prior to conducting the study

Inductive Leap • The probability of obtaining a particular result assuming the null is true (p level) is equal to a measure of effect size times a measure of the size of the sample p = effect size ×size of study • Therefore, p (the probability of a type 1 error) is influenced by both the size of the effect and the size of the study • Remember, if we want to reject the null, we want a small p (less than alpha)

Understanding Errors in Hypothesis Testing and Power Analysis

Understanding Errors in Hypothesis Testing and Power Analysis

Presentation Transcript

Chapter 9

CHAPTER 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

CHAPTER 9

Chapter 9

Chapter 9

Chapter 9