140 likes | 267 Vues
Statistical Power. Power in a nutshell. Get the biggest sample you can Benefits: Sample is more representative of the population More likely to discover the true relationship Reminder: Some things are independent or very nearly so- ES = 0. Maximum Power!.
E N D
Power in a nutshell • Get the biggest sample you can • Benefits: • Sample is more representative of the population • More likely to discover the true relationship • Reminder: Some things are independent or very nearly so- ES = 0.
Maximum Power! • In statistics we want to give ourselves the best chance to find a significant result if one exists. • Power represents the probability of finding that significant result • p(reject H0|Ho is false) • As we have discussed it is directly related to type II error rate () • Power = 1 -
Two kinds of power analysis • A priori • Used when planning your study • What sample size is needed to obtain a certain level of power? • Post hoc • Used when evaluating study • What chance did you have of significant results? • Not really useful. • If you do the power analysis and conduct your analysis accordingly then you did what you could. To say after, “I would have found a difference but didn’t have enough power” isn’t going to impress anyone.
How many subjects? • How many research subjects does it take to screw in a light bulb? • At least 300 if you want the bulb to have adequate power
A priori power approach • Can use the relationship of sample size (N), effect size (d), what the alternative hypothesis distribution is centered on (the noncentrality parameter d) plus our specified alpha (a) to calculate how many subjects we need to run • Decide on your level • Decide an acceptable level of power/type II error rate • Figure out the effect size you are looking for • Calculate N
A priori Effect Size? • Figure out an effect size before I run my study? • Several ways to do this: • Base it on substantive knowledge • What you know about the situation and scale of measurement • Base it on previous research • Use conventions
An acceptable level of power? • Why not set power at .99? • Practicalities • Cost of increasing power, usually done through increasing n, can be high • Notice how for small effects one needs enormous sample sizes to be able to reject the null
Alternative way of thinking about power and effect size • As we are interested in getting an interval estimate on an effect size we can estimate the sample size needed for an interval of a certain width • Moderate effect (d = .5), interval width .5 (.25 on either side) • From library MBESS1 • ss.aipe.smd(delta=.5,conf.level=.90,width=.50) • Would need 90… per group.
Howell’s general rule • Look for big effects or • Use big samples • You may now start to understand how little power many of the studies in psych have considering they are often dealing with small effects • Many seem to think that if they use the ‘rule of thumb’ for a single sample size (N = 30), which doesn’t even hold that often for that case, that power is solved too • By the way you’d need N = 200 for a single sample and small effect (d = .20) • This is clearly not the case
Post hoc power • If you fail to reject the null hypothesis might want to know what chance you had of finding a significant result – defending the failure • As many point out this is a little dubious • Better used to help understand (vaguely) the likelihood of other experiments replicating your results • But again, your sample size tells you that already
Something to consider • Even the useful form of power analysis (for sample size calculation) involves statistical significance as its focus • While it gives you something to shoot for, our real interest regards the effect size itself and how comfortable we are with its estimation • Emphasizing effect size over statistical significance in a sense de-emphasizes the power problem • Small samples will result in very wide confidence intervals for effect size, and that would explicitly reflect a lack of understanding of its true nature
Reminder: Factors affecting Power • Effect size • Bigger effects are easier to spot • Alpha level • Larger alpha = greater chance for type I error = more ‘liberal’ = less chance for type II error = greater power • Sample size • Bigger is better