Hypothesis Testing

Hypothesis Testing

Hypothesis Testing • Research hypothesis are formulated in terms of the outcome that the experimenter wants, and an alternative outcome that he doesn’t want • I.e. If we’re comparing scores on an exam with two groups, one with test anxiety and one without, our hypotheses are: • (1) That the group with test anxiety will score higher (expected outcome) • (2) The two groups will score the same (unexpected outcome)

Hypothesis Testing • The hypothesis that outlines that outcome that we’re expecting/hoping for is the Research Hypothesis (H1) • The hypothesis that runs counter to our expectations is the Null Hypothesis (Ho)

Hypothesis Testing • We can use the sampling distribution of the mean to determine the probability that we would obtain the mean of our sample by chance • I.e. the same way we could convert a score to a z-score, and determine the probability of obtaining values higher or lower than it

Hypothesis Testing • If the probability is low (i.e. only a 5% chance or less), we can assume that chance sampling error did not produce our results, and our IV did • I.e. In our comparison of people with test anxiety, our test anxious group may also be quite dumb, resulting in their poor test scores. However, if their scores are extreme enough (low), we can discount even that possibility

Hypothesis Testing • Why bother with Ho at all? • Technically, we can never prove a particular hypothesis to be true • You cannot prove the statement: “All ducks are black”, because you would have to have observations on all ducks that were, are, and ever will be (i.e. on all ducks) • You can disprove a hypothesis – “All ducks are black” can be easily proven false by seeing one white (non-black) duck • This is why technically, we are supposed to talk about “rejecting Ho” and not “accepting H1” and “failing to reject Ho”, never “proving H0”

Hypothesis Testing • Beginning with the assumption that H0 is true, and trying to disprove it also maintains the scientific spirit of objectivity and skepticism • Objectivity – illustrates that we value the results of the data more than the hypothesis that, if proven, would make us happiest (H1) • Skepticism – showing that we are not convinced of even our own hypothesis until confirmed by the data

Hypothesis Testing • In our example of people with (x1) and without test anxiety (x2), where our hypothesis is that people with anxiety will have lower IQ scores: • Ho = [x1 ≥ x2] • H1 = [x1 <x2]

Hypothesis Testing • If, instead, we were testing if the group with anxietywas different from the average student population (Hint: Look at the italics), how would we phrase Ho and H1? • What if we were testing whether or not the two groups (x1 & x2) were equal?

Hypothesis Testing • How do we know when our sample is rare enough to fail to accept Ho? • Statistical convention says when the probability of obtaining a mean that exceeds the one you’ve obtained is only 5% or less, we can says this is not due to chance • AKA the probability of rejecting Ho when it is “true” (i.e. screwing-up) = significance/rejection level/alpha/critical value • HOWEVER THIS DOES NOT MEAN THAT 5.1% IS MEANINGLESS!

p<.05

Hypothesis Testing • For our group with test anxiety, if their mean score on an IQ test was 70, we first convert this into a z-score (μ = 100, σ = 15) • z = (70 – 100)/15 = -2 • Since our H1 is that the group with anxiety will be less than those without, we look at the percent in the “Lesser Portion”

Hypothesis Testing • Look at Table E.10, the probability of obtaining a score at or below z = -2 is .0228 or 2.3% • Since this is below the 5% convention, we would reject Ho (or “accept” H1)

Hypothesis Testing • α is the p(“accepting” H1 when it is false/rejecting H0 when it is true), or of making a mistake called a Type I Error • p(“accepting” H1 when it is false) ≠ p(“accepting” H1) – the former refers to a type of error, the latter simply to an outcome • What about the p(“accepting” H0 when it is false/rejecting H1 when it is true)? • This is called a Type II Error, or β (Beta)

Hypothesis Testing • Why not make α as small as possible? • Because as α [p(Type I Error)] decreases, β [p(Type II Error)] increases • Red = α, Blue = β

Hypothesis Testing • It seems like we care more about Type I Error than Type II Error. Why? • Scientists are more likely to commit a Type I Error because they are more motivated to prove their hypothesis (H1) • In Law, establishing motive is important to proving guilt, without a motive, there’s little reason to expect that a crime will occur, let alone stringently attempt to protect against it

Hypothesis Testing • So long we’re only willing to take a 5% of incorrectly rejecting Ho, it doesn’t matter how we distribute this 5%, as long as it doesn’t exceed 5% • We can place all 5% in one “tail” of the distribution if we only expect a difference in means in one direction = One-Tailed/Directional Test • We can place half of 5% (2.5%) in either “tail”, if we have no a priori (before) hypothesis about where our mean difference will be – Two-Tailed/Non-Directional Test • The decision of which type of test to use should be made a priori based on theory, not data driven

Hypothesis Testing One-Tailed Test Two-Tailed Test

Hypothesis Testing • Ho and H1 with One- and Two-Tailed Tests: • For One-Tailed Tests: • If our hypothesis is that group x is lower than group y • Ho = (x ≥ y) • H1 = (x < y) • For Two-Tailed Tests: • If our hypothesis is that group x is either greater than or less than group y • Ho = (x = y) • H1 = (x ≠ y)

Hypothesis Testing • Psychologists can be sneaky bastards and covertly increase α by testing one hypothesis many times by: • Evaluating one hypothesis with many different statistical tests • Using more than one measure to operationalize one DV • i.e. Measuring depression with both the Beck Depression Inventory-II (BDI-II) and the Minnesota Multi-Phasic Personality Inventory-II (MMPI-II) = testing depression twice = doubling your α

Hypothesis Testing • What should you do to prevent this from happening? • If you’re testing one hypothesis many different ways or with many measures, adjust α accordingly w/ the Bonferroni Correction • Note: NOT the same as the Beeferoni™ Correction, which prevents incorrect preparation of Chef Boyardee ™ products • Testing w/ 2 tests; Test using α = .05/2 = .025 • Test using 3 measures of one construct; Use α = .05/3 = .0167 • Testing w/2 tests and 3 measures; Use α = .05/6 = .008

Hypothesis Testing • Example: • Your hypothesis is that males and females will differ in degree of instrumental aggression (IA = aggression designed to obtain an end). IA is measured with the Instrumental Aggression Scale (IAS) and the Positive and Negative Affect Scale (PANAS), and the groups are evaluated with both ANOVA and SEM • What is your corrected α-level?

Hypothesis Testing • Three of the Ten Commandments of Statistics: • 1. P-Values indicate the probability that your findings occurred by chance or the likelihood of obtaining them again in a similar sample NOT the strength of the relationship between an IV and DV • I.e. NEVER SAY “In my experiment evaluating the influence of coffee (the IV) on people’s activity levels (the DV), I found highly significant results at p = .000001, indicating that coffee produces a lot of activity in people” • CORRECT – “The likelihood that the effect, that coffee boosted activity levels, was due to sampling error (i.e. chance) was only .000001”

Hypothesis Testing • Three of the Ten Commandments of Statistics: • 2. p = .052, .055, etc. is not “insignificant”, and does not mean that a relationship between your IV and DV does not exist, just that it did not meet “conventional” levels of significance. • 3. When testing a hypothesis multiple ways, always use some corrected level of α (i.e. the Bonferroni Correction).

Hypothesis Testing